Mark Lubin

← Back

Agent Skills Are Frozen Intent

February 21, 2026

OpenClaw’s SKILL.md system lets you define agent behaviors in prose: markdown files with YAML frontmatter, published to a marketplace called ClawHub. Over 3,000 published skills. Publishing requires only a week-old GitHub account. The core unit is a file where you describe what an agent should do in plain English, with some YAML metadata at the top. Non-programmers can author agent behaviors. Memory lives as flat files — markdown, YAML, JSONL.

It’s an impressive adoption story. Over 100,000 GitHub stars in about 60 days. Peter Steinberger acqui-hired by OpenAI roughly 90 days after launch. Three name changes in two months (Clawdbot, Moltbot, OpenClaw). The philosophy is speed and accessibility over control — Steinberger has talked openly about shipping code he hasn’t personally read, queuing multiple AI agents on features simultaneously. It’s a coherent set of tradeoffs, and they’re the tradeoffs that produce adoption.

But the more I look at what OpenClaw actually is, the more I think the skill marketplace isn’t just a security problem. It’s the wrong abstraction.

The security problems are real

512 documented vulnerabilities, 8 critical. Over 135,000 exposed instances. Of the skills audited on ClawHub, 341 were flagged as malicious — about 12% of the sample, nearly all traced to a single coordinated operation.

When the “code” is natural language in a marketplace with minimal review, and the runtime has broad system access, the attack surface is the authoring model itself. This is structurally familiar. PHP had SQL injection and XSS endemic for a decade-plus. Register globals on by default — any query parameter could overwrite any variable. The problems weren’t in the concept. They were in the defaults, the ecosystem culture, and the gap between what the tool made easy and what it made safe.

But PHP’s security problems were solvable. Parameterized queries replaced string concatenation. ORMs abstracted away raw SQL. Frameworks shipped sane defaults. Prompt injection might not have the same kind of solution — when the code and the data are the same format, the attack surface might be architectural.

That’s worth sitting with, because it points past the security discussion toward the structural question.

The marketplace is the wrong abstraction

A skill on ClawHub is someone’s frozen prediction of what you’ll want an agent to do. Someone imagined a use case, wrote prose instructions for it, published it. When you install that skill, you’re adopting their assumptions about what matters, what to ignore, what order to do things in. The skill author had to predict your intent at authorship time, and that prediction is frozen into the artifact you download.

This made sense for software distribution. npm packages, WordPress plugins, app stores — the author writes code that can’t interpret your intent, so they have to anticipate it. The intermediary exists because the runtime can’t reason. PHP needed WordPress themes and plugins because PHP executed instructions literally. The marketplace existed because the runtime was dumb.

But the runtime here can reason. That’s the whole point of building on a language model. If you give an agent access to tools through typed interfaces — a filesystem tool, an API client, a database connection — it can interpret what you want directly. You say “organize my project files by date” and the agent figures out how to do that with the tools it has. No one needed to have written an “organize-by-date” skill. No one needed to have predicted that intent and frozen it into prose.

The skill marketplace is a software distribution pattern applied where software distribution isn’t needed. The agent is already the interpreter. The marketplace is an intermediary layer for a system that doesn’t need one.

And this is where the security problems become more than just growing pains. If the security issues of natural-language-as-code are harder to solve than PHP’s were, that’s an argument for not building a marketplace around it in the first place. Give the agent typed tool interfaces instead of prose scripts, and a whole class of injection problems goes away — not because you solved prompt injection, but because the tools aren’t specified in natural language.

Persistence makes it worse

The marketplace model also conflicts with what makes agents useful over time. An agent that runs for hours or days accumulates context — it learns your file structure, your naming conventions, your preferences. The more situated understanding it builds, the less a generic marketplace skill fits.

An OpenClaw user called EmpireCreator lost 45 hours of accumulated agent context to silent compaction. Decisions made, tasks completed, continuity needed for ongoing work — gone without warning. They ended up building an ad-hoc checkpointing system from scratch: memory flush flags, custom bash skills for logging, a HEARTBEAT.md health check file.

That 45 hours wasn’t just data loss. It was the destruction of exactly the kind of situated understanding that makes a persistent agent useful, and that no pre-authored skill can substitute for. A persistent agent’s behavior should emerge from its accumulated experience and its available tools, not from a static template someone published to a hub.

Tools, not scripts

Think about how operating systems work. The kernel doesn’t download behavioral scripts from a marketplace. It exposes typed interfaces — syscalls — and processes use those interfaces to do what they need to do. The capabilities are the tools. read(), write(), open(). What you do with those capabilities is determined at runtime by your intent, not by someone else’s prediction of it.

If an agent has a filesystem tool with a typed schema, it doesn’t also need a marketplace of prose instructions telling it how to use a filesystem. The tool interface is the skill. The agent composes capabilities at runtime based on what you actually want, not based on what someone anticipated you might want and froze into a markdown file.

The alternative to the skill marketplace isn’t a better skill marketplace. It’s agents with access to typed tool interfaces — capabilities with schemas — that compose at runtime based on user intent. The hard problem shifts from “author good skills” to “build good tools.” But tools with typed interfaces, managed by something that handles their lifecycle and failure modes — that starts to look less like a marketplace and more like a kernel.

If OpenClaw is PHP, the question isn’t what’s the next framework. It’s what’s the operating system.