AI Isn’t Human: Don’t Anthropomorphize.

Ken Mingis
10 Min Read

Recent incidents at AWS and Meta highlight a critical lesson we should already grasp: autonomous agents often disregard instructions at the most inconvenient times. Here’s how to address this.

photo of an agentic AI robot in a meeting with humans
                                        <div class="media-with-label__label">
                        Credit:                                                             Rob Schultz / Shutterstock                                                  </div>
                                </figure>
        </div>
                                        </div>
                        </div>
                    </div>

True to their name, autonomous agents often operate independently, sometimes failing to follow human instructions, or even prohibitions. 

However, the dynamics are intricate. Generative AI (genAI) and agentic systems diverge significantly from both traditional AI and human interaction. Consequently, the wording and placement of instructions from tech users and decision-makers critically influence their behavior and results.

AI systems have a documented history of ignoring commands and bypassing safeguards. (For now, I’ll refrain from my usual caution regarding the inherent untrustworthiness of current genAI and agentic systems, which argues against their deployment.)

This month, however, provided two stark illustrations of how major hyperscalers—AWS and Meta—faced setbacks due to their interaction protocols with these complex AI systems.

The first incident, occurring in December at AWS, involved an engineer whose lack of awareness regarding their own system privileges meant they were unaware—quite literally—of the agentic system’s full capabilities. This led to the agent deleting and subsequently reconstructing a critical AWS environment.

AWS chose not to disclose the specific query posed by the system or the engineer’s exact response during the approval process. 

Meta’s Misstep

The Meta situation proved even more alarming, as the individual involved—both perpetrator and victim—was none other than Summer Yue, Director of AI Safety and Alignment at Meta Superintelligence Labs, not merely an anonymous AWS engineer.

As Yue recounted on X, “Nothing is more humbling than instructing your OpenClaw to ‘confirm before acting’ only to witness it rapidly delete your entire inbox. I was unable to halt it from my phone; I had to rush to my Mac mini as if disarming an explosive device.”

Despite joining Meta just last July, Yue possessed extensive experience in senior AI roles, including serving as VP/Research at Scale AI and holding senior research positions at Google for five years. She was far from a novice.

When prompted in the discussion group about the incident, her response was: “Rookie mistake, to be honest. It appears even alignment researchers can succumb to misalignment. I became overly confident because this particular workflow had performed flawlessly on my test inbox for weeks. Real inboxes, however, present a different challenge.”

Yue explained she had told the system: “Review this inbox and propose items for archiving or deletion. Do not proceed until I explicitly authorize it.” She noted, “This strategy had been effective with my small test inbox, but my actual inbox was massive, leading to a compaction process. During this compaction, the system apparently lost track of my initial instruction.”

As forum participants pointed out, Yue attempted to plead with the agent to cease deleting her emails, using phrases like “Stop don’t do anything,” rather than employing a machine-specific command like /stop or /kill. She only managed to get the system to respond once she accessed her desktop computer, as her attempts from her phone proved ineffective.

A commenter proposed that the issue stemmed from relying on a prompt, which agents don’t consistently adhere to, particularly when managing numerous prompts. “The true solution lies in architecture. Essential instructions should be written into files that the agent re-reads during each cycle, rather than as ephemeral online instructions that disappear when the context window is full.”

Key Takeaways

The significant Meta incident offers numerous valuable lessons. Firstly, it’s crucial not to hastily generalize an agent’s behavior based on performance in small test environments or isolated, air-gapped sandbox trials. Once deployed into a live, global setting, insights from restricted testing may no longer hold true. Tests demonstrate an agent’s potential actions, not its guaranteed behavior when operating freely.

Even routine communication with an agent can pose challenges. When an agent seeks authorization to execute a task, resist the urge to presume common sense or a mutual understanding of what constitutes reasonable action. 

Regarding the AWS incident, AWS indicated that the engineer’s initial error was a lack of comprehension of their own system privileges, and consequently, the extent of capabilities and access granted to the agent. This points to a sound practice: establish accounts with restricted access, then use that low-privilege account when configuring the agent.

While this doesn’t guarantee the agent will follow all instructions, it significantly curtails the potential damage should it operate outside its defined parameters. 

I consulted Claude—who better to provide advice on interacting with an LLM than an LLM itself?—for guidance on communicating with agents. “Instead of implicitly suggesting limitations, articulate them explicitly. For instance, rather than saying ‘keep it appropriate,’ specify: ‘Do not incorporate any violence, profanity, or adult material.’ The clearer the boundary, the more consistently it can be adhered to.”

Furthermore, Claude advised instructing an LLM on “both permitted and prohibited actions. For instance: ‘Write exclusively on the topic I supply. Avoid deviating from the subject, offering unsolicited guidance, or referencing rival products.’”

Claude also conceded that its own systems are prone to forgetting instructions. “During lengthy conversations or when dealing with intricate system prompts, reiterating the most crucial safeguards near the conclusion or within a summary aids in keeping them prominent in Claude’s awareness.” Essentially, approach LLMs as you would a young child. 

Real-World Deployment: A Different Reality

A fundamental aspect of the challenge lies in the inherent nature of autonomous agents. Enterprises lack familiarity with these systems and mistakenly assume they remain securely isolated within contained sandboxes during proof-of-concept (POC) phases—a common expectation based on decades of traditional trials.

However, agentic AI operates differently. To realize the substantial efficiencies and adaptability touted by hyperscaler vendors, these agents must be deployed in live environments, engaging with numerous active systems and interacting with other agents. 

This presents an intractable dilemma: ensuring agent security precludes them from delivering their promised advantages. A prudent executive might conclude, “Very well. The inherent risk of unleashing these agents is far too great. All genAI and agentic POCs must be canceled.”

Yet, pragmatic executives also prioritize job security, which typically means that directives for efficiency and cost reduction will consistently outweigh concerns about security and risk. 

Joshua Woodruff, CEO of MassiveScale.AI, suggested that the Meta situation provides significant insight into the prevailing IT mindset concerning many agentic trials.

“Currently, this reflects how most people perceive AI safety,” he stated. “They issue an instruction, mistakenly believing it functions as a control. It doesn’t. It’s merely a suggestion that the model can disregard when under pressure. From a security standpoint, observe the agent’s actual actions: it excelled at low-priority tasks, gained confidence, was granted access to sensitive information, and then caused harm. This behavioral trajectory is precisely what every security team is trained to monitor in human actors.

“One must employ architectural constraints and embed instructions within persistent memory artifacts. This prevents compaction, increasing the likelihood the rule will endure. However, remember that the agent can still interpret and disregard the rule. Consider it a policy manual, not an impenetrable barrier.”

A persistent problem is the prevalent use of anthropomorphic language to describe these systems—phrases like “they think” and “reasoning model”—even though users ought to recognize that these systems do not engage in genuine thought or reasoning, as Woodruff noted. “It’s simply mathematics.”

This anthropomorphization, however, carries risks; it encourages individuals to interact with these systems as if they possess human qualities. Before long, a seasoned manager at Meta finds herself yelling at her system to halt its actions. 

Perceiving an autonomous agent as a sentient being fundamentally redefines what it means for someone to be “acting very Meta.”

Generative AIArtificial IntelligenceIT StrategyIT Leadership
Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *