Skip to main content

Security Researchers Raise Alarms Over Anthropic Fable's Guardrails

Anthropic unveiled its newest model, Fable, on Tuesday, presenting it as a publicly accessible, albeit restricted, iteration of its highly anticipated

2 min read9 views5 tags
Originally reported bytechcrunch

Anthropic unveiled its newest model, Fable, on Tuesday, presenting it as a publicly accessible, albeit restricted, iteration of its highly anticipated and potent cybersecurity model, Mythos.

However, these restrictions have met with dissatisfaction among a segment of cybersecurity researchers and professionals, who have voiced their concerns online.

Valentina “Chompie” Palmiotti, a prominent security researcher at IBM X-Force, highlighted the model's sensitivity, stating, “[Fable] rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post.”

These stringent guardrails were implemented to mitigate the inherent risk of Fable being exploited for malicious purposes, such as developing malware or compromising software – a persistent concern within Anthropic. Similarly, restrictions pertaining to biology stem from analogous worries regarding the development of biological weapons.

When the AI powerhouse initially launched Mythos in April, its access was confined to a select group of companies and organizations under "Project Glasswing," an initiative aimed at deploying the model to safeguard critical software and infrastructure. Just last week, Anthropic broadened Mythos's availability, extending access to hundreds of organizations across 15 countries.

Despite these well-intentioned measures, numerous cybersecurity experts remain concerned by what they perceive as the arbitrary nature of the restrictions. Matt Suiche, a seasoned cybersecurity veteran, conveyed to TechCrunch that “if you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded.” Fable is designed to revert to Claude Opus 4.8 upon encountering a guardrail, suggesting that its filtering mechanism is largely “keyword based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails.”

Suiche, also a member of the technical staff at Tolmo, an AI cybersecurity startup, offered a nuanced perspective, suggesting, “But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies.” He concluded, advocating for caution, “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time.”

Separately, another researcher expressed frustration on X, noting that “even asking for a code review” is sufficient to trigger Fable’s protective guardrails.

Anthropic did not provide an immediate response when contacted for comment.

In addition to the internal guardrails within its models, Anthropic mandates that cybersecurity professionals apply to its Cyber Verification Program. Successful applicants are granted fewer restrictions when utilizing Claude for cybersecurity-related tasks. OpenAI operates a comparable initiative known as Trusted Access for Cyber.

#AI News#Anthropic#Fable#Guardrails#Cybersecurity
ES
Editorial StaffEditor

The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.

View all posts
Reader feedback

What did you think of this story?

User Comments

Filter:
No comments yet. Be the first to comment!
Continue reading
View all news