Two Commands That Fixed My Biggest Claude Code Problems

I stopped attempting large features with Claude Code.

Not because the model couldn’t handle them — it absolutely could — but because the conversation couldn’t survive them. Twenty minutes into a cross-layer refactor, the AI would forget constraints I’d set at the start. Forty minutes in, it was hallucinating function signatures that didn’t exist in my codebase. By the hour mark I’d abandoned the session and started over, re-explaining everything from scratch.

This happened every day. And alongside it, a quieter problem: every correction I made — “don’t mock the database,” “use the service client, not OData,” “stop summarizing your own work” — vanished the moment the session ended. Next morning, same mistakes. I was the memory, and I was tired of it.

I built two commands to fix both problems. They changed how I work with AI more than any model upgrade has.

The Context Window Tax

Context is the scarcest resource in AI-assisted development. Not compute, not model capability — context. Every file the AI reads, every plan it makes, every line of code it generates eats into a fixed budget. When that budget runs out, the AI doesn’t crash. It degrades. Quietly. It drops constraints you set early in the conversation. It confuses one module’s patterns with another’s. It starts writing code that looks plausible but doesn’t match your codebase.

The insidious part is that this changes your behavior. You stop asking for ambitious things. A feature that touches the API layer, the domain model, and the test suite? You know the context window won’t survive it. So you break it into three separate sessions, manually threading context between them, doing the project management yourself. The AI becomes a fast typist you have to micromanage.

AI Praxis and its methodology approach helped — progressive disclosure, focused documents, smaller context loads. But the architectural problem remained: a single thread doing everything is a monolith. And monoliths don’t scale.

/orchestrate: The Project Manager Pattern

The fix was to stop letting the AI be the developer. Make it the project manager instead.

/orchestrate turns the main Claude Code thread into a coordinator that never touches code directly. It reads the task, breaks it into work units, and delegates each one to a specialized sub-agent — an Explore agent for investigation, a language-specific agent for implementation, a TDD agent for tests, a reviewer for quality checks. Each sub-agent gets its own context window, a focused prompt with full project context, and a specific deliverable. The orchestrator holds only the plan and two-sentence summaries of what each agent found.

Here’s what that looks like in practice. Say you need to add a feature that requires understanding how similar features work in your codebase, implementing a new handler, writing tests, and reviewing the result. Without /orchestrate, that’s one long conversation where investigation, coding, and review all compete for the same context window.

With it: three Explore agents investigate in parallel — one reads the requirements, one finds existing patterns, one checks for prior related work. Results come back as summaries. Then a language-pro agent implements the handler while a TDD agent writes tests simultaneously, each in their own context. Finally, a code-reviewer agent checks the result. The main thread stays under 20% context usage throughout.

Sessions that used to need three to eight continuations — where I’d hit the limit, copy my notes, start a new conversation, re-explain the task — now complete in one. The AI stays sharp because its own context window isn’t stuffed with raw code. It only holds decisions.

The mental model shift matters as much as the technical one. You stop thinking about what fits in one conversation and start thinking about task decomposition. The context limit becomes a decomposition signal, not a ceiling.

The Groundhog Day Problem

Even with orchestration keeping sessions efficient, a second problem persisted: Claude Code doesn’t learn between sessions.

Every conversation starts from zero. The same friction points, the same corrections, the same “no, we use X not Y” exchanges. I’d fix the AI’s behavior fifteen times in a week — testing approach, naming conventions, response verbosity — and none of it carried forward. AGENTS.md captures static project knowledge, and Claude Code’s memory system can store behavioral rules. But the gap is in the middle: the friction patterns that emerge from daily use never get captured because you’re too busy working to stop and write them down.

Methodology without memory is just documentation.

/introspect: Mining Your Own Transcripts

/introspect closes this gap by reading your actual conversation history. Claude Code stores every session as a JSONL transcript. /introspect scans the recent ones — typically five to eight of the largest — and mines them for friction signals.

It looks for corrections (“no,” “wrong,” “not that,” “I said”), repeated questions you asked more than once, user interruptions, frustration signals (ALL CAPS, ”!!!”, terse one-word replies), and manual multi-step workflows where you guided the AI through something step by step that should have been automated.

Then it cross-references what it found against your existing setup — current memories, existing commands, AGENTS.md rules — to avoid duplicates. And it doesn’t just report findings. It implements them. New feedback memories with the rule, the reason why, and how to apply it. New slash commands for workflows you repeated three or more times. Documentation patches for gaps it identified.

After a busy week, I ran /introspect for the first time. It found that I’d corrected the AI about testing approach four times across three sessions. It found I’d manually guided a deployment workflow step-by-step twice. It found six instances where I’d interrupted the AI mid-summary because I didn’t want a recap of what I could already see in the diff.

It created: a feedback memory about the testing approach (with the “why” so the AI could judge edge cases, not just follow a rule), a new slash command automating the deployment workflow, and a feedback memory about terse responses. The following week, all three corrections were already baked in. I didn’t repeat any of them.

The key insight: the best configuration is extracted, not authored. You can’t anticipate every friction point upfront. But you can mine them from real usage data after the fact.

The Feedback Loop

The two commands compound. /orchestrate keeps sessions efficient and generates rich transcripts from productive work. /introspect mines those transcripts and feeds improvements back into the system — better memories, better commands, tighter AGENTS.md rules. The next /orchestrate session benefits from all of it.

Each cycle tightens the loop. After a month of weekly /introspect runs, the AI feels like it “knows” your working style. Not because the model improved — because your local configuration evolved based on real friction data. The system compound-learns from your corrections instead of forgetting them.

This is the difference between using an AI assistant and developing a working relationship with one.

What Changed

Three months ago, I restarted conversations every twenty minutes on complex tasks. I repeated the same five corrections every session. I avoided multi-file features because the context window couldn’t handle them.

Now, orchestrated tasks run for an hour without degradation. Corrections stick across sessions. I attempt things I would have split into a week of manual work — and they land in a single sitting.

The commands aren’t magic. They’re structural fixes for two fundamental limitations: finite context and absent memory. One decomposes work so the AI stays sharp. The other closes the learning loop so the AI gets better over time.

Both ship as part of AI Praxis, an open-source methodology system that bootstraps your AI coding assistant for your specific project. The commands are markdown files — no runtime dependencies, no plugins, no vendor lock-in. Clone the repo, point your AI tool at it, and it generates everything tailored to your stack.

I used to treat context limits as a ceiling. Now they’re a decomposition signal — proof that the task was too big for one thread, not too big to attempt.