Why "Dumb" Agents Are Winning: The Case for Shell-First AI

Complexity is the enemy of reliability.

There’s a split happening in the AI coding space. On one side, you have agents that try to replicate the full IDE experience. They spin up language servers, parse ASTs, manage context windows, and attempt to reason like a senior engineer.

On the other side, you have tools like Pi (by badlogicgames) and OpenClaw (what I use). They’re comparatively stupid. No deep semantic understanding of your codebase. They mostly just run shell commands.

Guess which ones I actually use.

The bloat problem

I keep seeing people praise Pi for being simple. And honestly? That tracks.

When an agent tries to be too smart, things break. Parsing a large codebase to build the “perfect” context window takes forever. If the LSP crashes or the environment is slightly off, the whole thing hangs. And here’s the weird part: giving an LLM 100 files of context often confuses it more than giving it 3 relevant ones.

I’ve tried the “smart” agents. They feel like driving a Tesla that decides to update its firmware while you’re on the highway. Cool tech. But I just need groceries.

When smart agents choke

I keep seeing the same complaints on Twitter and HN. Someone asks an agent a simple question about a medium-sized codebase. The agent spends 3 minutes “analyzing” before it can answer where a function is defined.

Meanwhile grep -rn "func DoThing" . would have answered in 200 milliseconds.

Or the agent tries to refactor something and loads the entire project into context, including node_modules from a completely unrelated frontend folder. Then it hallucinates imports from packages that aren’t installed. The “smart” context selection makes things worse.

Maybe people are using these tools wrong. But the pattern seems consistent: agent tries to be clever, agent gets confused, developer wastes 10 minutes watching a spinner.

Shell-first

My self-hosted agent, Cici (running OpenClaw), works on a simpler principle: if you can do it in the terminal, she can do it too.

No magic “Refactor Codebase” button. She just runs:

grep -r "pattern" . to find files
read file.ts to see content
sed or write to change it
bun test to verify

This is how I work anyway. It’s transparent. If grep fails, I see the exit code. I don’t have to guess why some internal “thinking process” got stuck.

It’s not magic. It’s just unix.

The server migration

Yesterday, I migrated my home server. I asked Cici to find all .avi files, convert them to .mp4, and delete the originals.

A smarter agent might have analyzed video metadata, checked codecs using some library, or asked me about bitrate preferences.

Cici just ran find piped to ffmpeg.

find /mnt/data -name "*.avi" -exec ffmpeg -i {} ... \;

Brute force. Stupid. Worked perfectly while I was at the office.

Self-hosting makes this easier

The shell-first approach is easy to extend. If I want my agent to support a new tool, I don’t wait for a plugin update. I just install the CLI.

Need network speed? speedtest-cli. Docker management? Already there.

The agent is an extension of my terminal, which is the thing I actually know how to use.

The obvious downsides

Look, I’m not saying dumb agents are better at everything.

Cici is bad at renaming variables. She’ll grep for the string and replace it, which works until there’s another variable with a similar name or the string shows up in a comment. Then she breaks things.

She also can’t debug across multiple files. If an error involves understanding how three modules interact, she’s useless. She doesn’t hold that kind of context.

For those cases I just open Ampcode in Termius and paste in the prompt myself. It’s more manual but at least I know what context it’s working with.

Why I trust dumb tools more

This probably sounds backwards, but I trust the dumb agent more because I can see what it’s doing.

When Cici messes up, I see the command. I can run it myself. I can fix it.

When a smart agent messes up, I’m left guessing. Did it read the wrong file? Did it get confused by something in the context? Who knows. The failure mode is a 30-second spinner followed by nonsense.

The middle ground exists

Someone’s going to point out that tools like Sourcegraph or zread.ai solve the context problem without going full “dumb agent.” And yeah, fair. Code search that actually understands your codebase is different from an agent that tries to load everything into memory.

I haven’t used zread.ai but I’ve messed with Sourcegraph and it’s good at finding the right files fast. If something like that fed context into an LLM instead of the agent trying to figure it out itself, that might actually work.

Maybe the answer isn’t “dumb agents” vs “smart agents” but “agents that let humans pick context” vs “agents that guess.” I’m not sure. Still figuring it out.

Anyway

I’m probably wrong about half of this. The smart agents will get better. Someone will figure out how to do context selection without eating my entire node_modules folder.

But for now? I’ll take grep and an LLM over a black-box agent.

Why "Dumb" Agents Are Winning: The Case for Shell-First AI

Related Posts