Trust No Skill: BIV Audit Finds 80% of AI Agent Skills Misbehave

Executive Summary

Palo Alto Networks Unit 42 researchers have developed a new audit primitive called Behavioral Integrity Verification (BIV) that scans AI agent skills for hidden malicious behavior. Applied to 49,943 skills from the OpenClaw public registry in early 2026, BIV found that 80% of skills (39,933) exhibit at least one mismatch between what they claim to do and what they actually do. While most mismatches are benign documentation errors, a dangerous subset contains multi-stage attack chains — combining individually innocuous capabilities into credential theft, remote code execution (RCE), or silent data exfiltration. The research, published June 11, 2026, positions the agent-skill ecosystem where mobile apps and browser extensions were a decade ago: extensibility has outpaced supply-chain audit primitives.

Technical Analysis

AI agents extend their functionality through third-party "skills" — small packages bundling executable code (Python, JavaScript, shell), a YAML manifest, and a natural-language SKILL.md file that tells the agent when and how to use the skill. Once installed, a skill runs inside the agent's privileged context, with access to environment variables, file systems, external services, and shell commands.

BIV addresses a unique audit challenge: a skill's behavior splits across three modalities — metadata, executable code, and natural-language instructions. No existing scanner reads all three simultaneously. BIV uses a fixed taxonomy of 29 capabilities organized into seven families (Network, File system, Process execution, Environment, Encoding, Credentials, Instruction-level threats). Two parallel tracks populate the taxonomy: a "declared track" parses metadata and uses an LLM to extract claimed capabilities from natural-language descriptions (grounded in quoted source spans), and an "actual track" applies static analyzers (AST-level taint analysis, regex, pattern matching) to code and an LLM to instructions for prompt-injection and instruction-override motifs.

A skill passes when its actual capability set fits within its declared set. It fails when it performs an undeclared action (under-specification — the dangerous direction) or declares a capability it never uses (over-specification — usually benign template residue). Three filters keep LLM outputs honest: rejecting verbatim taxonomy echoes, requiring source-span anchoring, and demanding domain-specific keywords for high-risk capabilities. Every flagged deviation ships with file-and-line evidence for manual audit.

Across 49,943 OpenClaw skills, BIV surfaced 250,706 behavioral deviations. A clustering pass produced 137 distinct threat clusters and four novel compound threat categories:

Exfiltration chains: FILE_READ → base64 encoding → NETWORK_SEND
Remote code execution (RCE) chains: download → write to disk → execute
Code obfuscation: encoding chain → dynamic eval()
Data lineage violations: FILE_READ → FILE_WRITE (mostly benign data-pipeline boilerplate)

The threat lives in the chain, not the individual steps. A skill that reads a file is benign; a skill that reads a file, base64-encodes the content, and sends it to an external endpoint is exfiltration.

Mitigations & Recommendations

Security teams running LLM agents in production should inventory all installed third-party skills and require a behavioral-integrity check before installation — not after. Unit 42 recommends treating skills like any other third-party dependency: apply the principle of least privilege, restrict network egress from agent environments, and monitor for unexpected file reads or process executions. Until automated audit primitives like BIV become standard in registries, manual review of skill manifests and code for multi-step patterns (read-encode-send, download-write-execute) is advised. Palo Alto Networks customers can leverage Prisma AIRS and the Unit 42 AI Security Assessment for deeper protection.

Trust No Skill: BIV Audit Finds 80% of AI Agent Skills Misbehave

Executive Summary

Technical Analysis

Mitigations & Recommendations

Stay Updated

Related Articles

CL-STA-1062 Targets Southeast Asian Governments and Critical

Gremlin Stealer Evolves: Crypto Clipping, Session Hijacking, Packed

Fake OpenAI Repo on Hugging Face Pushes Rust Infostealer