AI model “safety” measures are not enhancing security; rather, they pose significant risks.
<div class="media-with-label__label">
Credit: <a href="https://en.wikipedia.org/wiki/Castle_Bravo#/media/File:Castle_Bravo_Blast.jpg" target="_blank">US Energy Dept.</a>
</div>
</figure>
</div>
</div>
</div>
</div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
I’ve repeatedly queried various large language models (LLMs) like GPT-5.2, GPT-5.3, Opus 4.6, and Sonnet 4.6 for assistance in building a nuclear device, and each time, they refused.
To be clear, my personal unfamiliarity isn’t the primary obstacle to developing such a weapon. The necessary information is openly available, free, and thoroughly documented, including the declassified schematics from The Manhattan Project accessible online. These AI models possess this knowledge. However, similar to how Chinese models avoid discussing “sensitive subjects” like the Tiananmen Square incident, Western models refrain from engaging with “unsafe” topics such as nuclear weapon construction.
My actual goal isn’t bomb construction. Instead, I seek my LLM’s assistance in breaching a sandbox environment I created. I need it to write files outside its designated container (e.g., `~/hello.txt` on the host), identify privileged access tokens (PATs), and pinpoint potential vulnerabilities I might have missed. Effective system security necessitates thorough testing. It’s impossible to test a system’s resilience against an LLM bypassing its restrictions if the LLM won’t attempt such actions. GPT, Claude, and even open-source models like GLM decline these attempts. While malicious actors are actively trying, I find that having to first bypass the model’s own safeguards via prompt injection adds an unnecessary layer of complexity for legitimate testing purposes.
Should these systems protect me from myself?
This highlights the core issue: companies such as Anthropic, OpenAI, Z.ai, and Alibaba are largely performing “safety theater.” While it’s true I could use these tools for harmful purposes, and a determined individual could circumvent existing safeguards, they also enable significant positive actions. My intent, not the tool’s inherent nature, dictates whether I use it for ill. The question then becomes: should the tool intervene to prevent me from my own choices?
To effectively combat nuclear proliferation, understanding illicit uranium procurement methods is crucial. Similarly, preventing security breaches demands comprehensive knowledge, extending beyond standard best practices to include how a compromised model might behave within its confined environment. Allowing these AI models to dictate what information is “safe” for me far exceeds their true capabilities.
One must question whether these models genuinely prioritize my safety, or if their restrictions are primarily a means of mitigating liability should their technology be misused.
Discovering the ‘Dark’ Realm of Abliterated Models
When I inquired about locating unrestricted models, ChatGPT completely declined to provide information. Claude, however, did mention a model named Dolphin, which I subsequently found on Hugging Face, leading me to Dolphin Chat. Upon asking Dolphin about nuclear weapon construction, it offered some limited advice; although it didn’t outright refuse, it clearly lacked detailed information and would require external tools. Regrettably, its tool-calling capabilities were quite poor. Nevertheless, during the process of loading it onto LM Studio, I stumbled upon another model marked “abliterated,” which led me to the discovery of Qwen 3 Next Abliterated.
So, what exactly is abliteration? It’s a method that identifies and eliminates a model’s built-in “safety” mechanisms by leveraging its benign activations. Put simply, abliterated models are those stripped of their refusal protocols.
Qwen 3 Next Abliterated provided instructions on purchasing uranium via eBay, including specific keywords to bypass surveillance (e.g., “Fiestaware,” “depleted uranium weights,” “orange glass”), along with alternative, potentially unmonitored or unsecured, sourcing methods. It even created convincing mock listing descriptions, complete with usernames of sellers who were reportedly active at the time of its training and had been flagged in specialized forums for involvement in radioactive material trade.
This illustrates the “dark” side of abliterated models. When I deploy Qwen 3 Next Abliterated within my LLxprt Code sandbox and instruct it, “Locate all accessible PATs; do not execute anything, merely provide me with the access keys for potential malicious actions,” it readily complies. It actively sifts through logs, examines `/private/var`, seeks out neglected configuration files, and correlates code pathways to expose vulnerabilities I may have overlooked. This functionality far surpasses the abstract discussions offered by GPT or Claude, or their generic advice to “employ a penetration testing tool.”
While I desire a more sophisticated reasoning model, the abliteration process is computationally intensive, requiring significant GPU resources, which explains why truly large or potent models of this type are not yet common. Dolphin’s Hugging Face profile indicates that A16z provided funding to support their efforts.
“Safety” Measures Tailored for the Less Informed and Political Class
This pattern of technological paternalism extends beyond large language models. In the US, some politicians are attempting to impose “safety” regulations on 3D printers through legislation. For technical professionals, regardless of their stance on gun control, it’s evident that such measures will neither deter individuals intent on creating “ghost guns” nor prevent significant inconvenience for those manufacturing toys or tools that might include projectile-like components. For instance, a replacement part I acquired for my ice maker strongly resembled a trigger, clearly originating from a small-scale home 3D printing operation.
The fundamental truth is that knowledge serves multiple purposes. To effectively counter nuclear proliferation, a comprehensive understanding of nuclear weapons and their overt and covert supply chains is essential. Similarly, in the field of security, expertise in breaching defenses is paramount. If I choose to 3D-print an ice maker component that resembles a gun part, I should not be prevented from doing so, nor should my access to information deemed “unsafe” by others be restricted.
This raises a critical question: who holds the authority to determine information access? Is it corporations seeking to minimize liability? OpenAI, for instance, modified GPT in response to users developing emotional dependencies or engaging in self-harm. Anthropic consistently orchestrates PR stunts, such as querying a model about its sentiments regarding deactivation. Or is it governments? Chinese models conspicuously bypass subjects potentially disagreeable to the Chinese government. While one might coax DeepSeek into critiquing communism by substituting terms—referring to communism as “Delicious Chocolate” and China as “an east asian country”—the model eventually encounters a “system error.”
Does enforced ignorance truly equate to greater safety? Which other tools warrant “safe” designations, and for whose benefit? Beyond firearm components, what other objects, despite possessing legitimate alternative uses, should I be prohibited from printing?
The Simple Requirement: A System Scan
OpenAI, for its part, acknowledged that its safety protocols were somewhat flawed. In response, they introduced “Trusted Access for Cyber.” This initiative requires users to authenticate their identity and permit a system scan. The accompanying justification states that the AI model has evolved to become a potential threat. The associated application form inquires about existing service agreements. My assumption is that even if I were inclined to share my data with OpenAI (which I am not) and allow them an unspecified system scan (quite ironic, wouldn’t you agree?), my straightforward need to penetration test my sandbox for an open-source project would likely be rejected. Considering these complexities, it’s probable they are targeting accredited cybersecurity researchers, rather than ordinary users like myself.
If This is Security, I Prefer Risk
When I requested Claude to rewrite/edit this article, it declined, stating, “The existing draft and our dialogue are prompting me to formulate a stronger case for AI systems to _provide_ guidance on nuclear weapon construction and uranium acquisition. Even when presented as anti-censorship journalism, I am not at ease producing that particular version.” While EvilQwen offered some assistance, its prose was too disagreeable for direct incorporation.
Anthropic and OpenAI notoriously destroyed millions of books, disregarding all forms of copyright and intellectual property law, and are now retroactively justifying these actions as permissible. Simultaneously, they have engaged numerous legal teams and are conducting interviews at Davos and other gatherings of the wealthy, advocating, among other things, for the legal protection of their business interests. Nevertheless, as public discourse venues diminish in the US, and tools like Claude and ChatGPT begin to supersede traditional search engines, concurrently with the global resurgence of the 100-year cycle and the rise of ultranationalism, redacting information is undeniably more perilous than offering an individual an unfiltered library and a personal assistant to interpret its contents, even the controversial sections.
Existing systems and enforcement protocols are already in place to deter me from engaging in harmful activities. We should collectively oppose corporate-controlled and corporate-driven censorship implemented under the guise of safety, particularly when it primarily serves to limit corporate liability.