Claude Code Just Became a CLI Orchestrator (And It's Kind of Perfect)
Someone on GitHub figured out you can make Claude Code treat your CLI like an interactive program. Not by adding AI to the CLI—by putting Claude in front of it. The CLI stays dumb, the skill file does the mapping, and the whole thing feels weirdly natural.
This started with an Android QA agent that records test scenarios for replay, but the pattern works for literally anything with a command-line interface. Deploy scripts, database migrations, accessibility audits—anything where you normally chain flags together manually.
The Setup: Your CLI, Claude's Problem
You build a basic CLI that accepts flags. Nothing fancy. Then you write a skill—a markdown file that teaches Claude how to map natural language to those flags. That's it.
The CLI wraps adb commands and records them. Claude reads the skill, parses what the user wants, and invokes the right commands. The user never sees flags. They just describe the test.
Prompt-Driven Everything
Here's where it gets interesting. Instead of making users learn syntax, you teach Claude to detect intent and activate features automatically.
The skill file looks like this:
Check the user's prompt for any of these keywords (case-insensitive): "track performance", "frame rate", "fps", "rendering".
If any keyword matches, add
--perfto the command.
That's the whole mechanism. Claude scans the prompt, matches keywords, passes the right flags. User says "measure performance while scrolling" and it just works.
You can stack these. Say "track performance and enable tracing" and Claude activates both features from a single sentence. Each feature has its own keyword list. The CLI just accepts --perf and --trace, writes config to a lock file, and the teardown script reads it later.
Human-in-the-Loop (When It Actually Matters)
Claude Code's AskUserQuestion tool lets you build programs that pause for real decisions and stay autonomous everywhere else.
Example: the tool needs to know which Android device to target. One device connected? It picks it. Multiple devices? Dropdown appears, user chooses. This pattern works for deploy targets, database selection, branch picking—defer to humans exactly when you should.
Session Control: Start It With Skills, Stop It With Hooks
Useful pattern for anything with setup/teardown: skill starts the process, Claude Code hook guarantees cleanup.
The skill tells Claude to call a start script. Script creates a lock file tracking session state. When Claude finishes, stop script reads the lock file, handles teardown, cleans up.
But what if the user hits Ctrl+C? A Stop hook in .claude/settings.json catches it.
The lock file is both a mutex (prevents overlapping sessions) and a state store (tells cleanup what to do). If Claude already stopped gracefully, lock file's gone, hook does nothing. Works for recording sessions, server processes, temp resources—anything with lifecycle management.
Building the Tool From Within (This Part's Unhinged)
The wild part: you build this tool from the same Claude Code session you use to run it. Claude distinguishes between "run this test" and "add a feature to the stop script" naturally.
Development loop is tight. Run the tool, notice something missing, say "add memory stats to session end." Claude reads existing code, understands patterns, implements the feature—and writes the skill markdown automatically.
None of the skill files in this project were created manually. Claude generates them as a byproduct of iteration. You describe behavior, Claude implements the script, then writes the skill teaching itself how to use it. Self-reinforcing cycle.
The Recipe
- Build a simple CLI that accepts flags and does one thing
- Write a skill mapping natural language keywords to flags
- Use
AskUserQuestionfor genuine ambiguities - Add a hook for lifecycle guarantees
- Iterate from within—let Claude build the next feature while you use the current one
Think about manual processes that could use automation. Pulling Jira data, deployment checklists, mobile QA, accessibility audits. Anything where you follow steps a CLI could drive.
Build the CLI first. Keep it simple. Then let Claude use it. Skills emerge from real usage. The tool evolves naturally when your primary user can reason about it.
What This Actually Enables
This pattern turns Claude Code into an orchestration layer. Your CLI is the primitive. The skill is the interface spec. Claude is the interpreter.
You're not building AI tools. You're building tools that Claude knows how to use. That's a weirdly powerful distinction.
The project that proved this pattern: github.com/tobrun/android-qa-agent
Ship dumb CLIs. Let Claude make them smart.