We Still Train AI.

2 Min Read

New research indicates agentic AI excels when given human-provided step-by-step instructions

A green robot at a blackboard - AI theme.
Credit: charles taylor / Shutterstock

A recent study indicates that AI agents, while capable, necessitate specialized procedural knowledge to execute tasks effectively and are currently unable to self-learn these capabilities.

To assess agentic AI efficacy, the study’s authors introduced a novel benchmark called SkillsBench. This benchmark evaluates performance across 84 distinct tasks spanning 11 diverse fields, including healthcare, manufacturing, cybersecurity, and software engineering. Researchers tested each task under three distinct scenarios: agents operating without any pre-existing skills (receiving only basic instructions), agents equipped with curated skills (provided with relevant directories, code snippets, and helpful resources), and agents tasked with self-generating their own skills (given no prior skills but prompted to develop them).

Examples of evaluated tasks included performing a security audit of npm dependencies to identify vulnerabilities, and analyzing differential protein expression data from cancer cell lines.

Agents leveraging curated skills achieved the highest performance, outscoring those with no skills by an average of 16.2 percentage points. This finding underscores AI’s ongoing reliance on human input. Interestingly, in 16 of the 84 tasks, human guidance actually led to a decline in results.

The impact of curated skills on performance varied considerably across different industry sectors. Healthcare tasks saw the most significant improvement, whereas software engineering tasks experienced only a modest benefit.

Crucially, agents prompted to generate their own skills showed no performance gains, reinforcing the observation that human intervention remains essential for effective task completion by AI.

Artificial IntelligenceRobotic Process AutomationEnterprise Applications
Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *