Cutting Claude Code Skill Costs With Haiku Subagent Delegation

Claude Code skills are powerful — but expensive. You write a .claude/skills/my-skill/SKILL.md, and the entire skill runs on whatever model your session uses: Sonnet, Opus, or cheaper. That means a skill that just fills a template or generates short structured text runs on your primary model the entire time. For skills you use frequently, that adds up.

I noticed something: my skills had two distinct phases. One phase required reasoning — gathering context, understanding intent, asking clarifying questions. The other was mechanical — filling a template with supplied facts, generating a short-form post, making a routing decision. The insight was simple: split them across models.

Context and Why This Matters

I maintain a homelab running a multi-cluster Kubernetes setup across three clusters. I use Claude Code as part of my infrastructure workflow — writing skills for common tasks like drafting blog posts, writing incident runbooks, and routing work to other skills.

The problem was cost: every skill execution happened on Sonnet, even when the work was mechanical.

Claude Code has two execution primitives that most people don't use together: Skills (instruction files with no model field, running on the session's model) and Agents (subprocess files with a model: frontmatter field, running in their own context). The pattern is elegant: skills do the smart work on an expensive model, then delegate mechanical work to a cheap model via the Agent tool.

The Architecture: Interface vs. Generation

Here's the mental model:

Skills = Interface Layer

Understand user intent
Gather context from conversation
Ask clarifying questions
Prepare a fully-specified brief
Delegate execution, return results

Agents = Generation Layer

Receive a fully-specified brief (no guessing)
Fill templates with supplied facts
Generate structured output
Run on the cheapest appropriate model

User input
    ↓
Skill (Sonnet): Understand intent, gather facts
    ↓
Agent (Haiku): Generate output from brief
    ↓
Return to user

The Non-Obvious Part: No Model Field on Skills

Skills have no model: field in their frontmatter. They always run on the session's model. You can't declare a skill to run on Haiku — it inherits whatever you're using.

Agents, on the other hand, have a model: field. They're subprocess execution. When you invoke an agent via the Agent tool, Claude launches a fresh context window with that agent's instructions, on the specified model, and returns the output.

The key insight: Don't try to put cheap models into skills. Put cheap models into agents that skills delegate to.

Once you invert your thinking, everything clicks. Skills become orchestrators, agents become workers.

The Working Solution

Three agent files, three updated skills.

`linkedin-writer.md`

---
name: linkedin-writer
description: Generates a LinkedIn post from a structured brief
model: haiku
color: magenta
tools: []
---

tools: [] — principle of least privilege. Haiku only needs to generate text. No file access needed.

`incident-doc-writer.md`

---
name: incident-doc-writer
description: Generates and saves a structured incident runbook
model: haiku
color: yellow
tools: ["Write", "Read"]
---

Gets Write access because it saves the runbook file to disk. Still Haiku.

`task-router.md`

---
name: task-router
description: Makes routing decisions for the orchestrator skill
model: haiku
color: cyan
tools: []
---

Pure routing logic. Receives context summary and user preference, outputs a structured routing decision. Haiku handles conditional logic fine.

Updated skill delegation (`linkedin-post` example)

## Execution
 
Once context is gathered, delegate generation to a subagent:
 
1. Compile a brief with: the core accomplishment, concrete details, tone preference, and link
2. Use the Agent tool with `model: haiku` to invoke the `linkedin-writer` agent
3. Pass the brief plus format rules
4. Return the haiku-generated post directly to the user unchanged

The flow:

/linkedin-post
  Sonnet: asks context questions (~1000 tokens)
  Agent(model: haiku) → linkedin-writer
    Haiku: generates post + hook alternatives (~200 tokens)
  Sonnet: returns result to user

Cost difference: ~1000 tokens on Sonnet (context gathering) + ~200 tokens on Haiku (generation) instead of ~1200 tokens entirely on Sonnet.

Lessons Learned

1. Separation of concerns is cost-effective. Reasoning requires capability. Generation from a fully-specified brief doesn't. If you're paying Sonnet for template-filling, you're leaving money on the table.

2. Principle of least privilege applies to models too. Don't give agents tools they don't need. Don't give tasks models they don't need.

3. tools: [] is a real constraint, not a hint. An agent with no tools can't call Write or Bash even if asked. This is good.

4. Haiku is reliable for structured output. The key is that facts are already gathered and prepared — Haiku isn't reasoning about incomplete information, it's filling a template.

5. The skill frontmatter should say "orchestrator," not "executor." Once you see skills as orchestrators, you stop trying to make them smart at generation. They become thin procedural wrappers: gather context, delegate, return result.

What's Next

The pattern is now standard in my homelab workflow. The broader goal is an autonomous infrastructure agent that monitors the homelab, detects problems, proposes fixes, and writes runbooks — all with optimal cost distribution across Claude's model family.

For now: ~60% cost reduction on frequent skill invocations. Not bad for inverting a mental model.