The Living Plan Got Fat: Compacting a Doc That Won't Stop Growing
In a post a few weeks ago, I took a victory lap. I’d finally found a way to work with AI coding agents that didn’t collapse into chaos or drown me in specs. It was a single living PLAN.md that the agent re-reads every session, where every decision gets funneled in and logged. I ended that post on a line I was proud of: the doc is the deliverable, the code is the byproduct, and over 30 days my PLAN.md got touched 16 times, more than any actual code file in the repo.
That line was true. It was also the setup for the next problem, which I did see coming. I have just been only talking to Opus lately and having Sonnet minions do all the grunt work, which was really working out. I could work for a couple of hours or more and context barely got over 10% and I am only on the Pro plan. I just thought I had a little more runway.
The exact property that makes a living plan work is what eventually turns it into a chore. A doc you re-read every single session is great at 3,000 words. At 28,000 words it’s a lot to load before any thinking starts. This post is about the maintenance layer, and the skill I built so I’d never have to think about it manually again.
If you didn’t read the first post, the one-line version: instead of vibe coding or a pile of spec files, I keep one PLAN.md that records decisions, open questions, and a session log, and the agent treats it as the source of truth.
- The Good Problem
- Progressive Disclosure, Which I Was Already Doing Everywhere Else
- What “Cooled” Actually Means
- The Payoff: 28,357 Words Down to 8,332
- It’s a Process
- Why It Became a Skill: The Other Project Running This
- The Skill Is a Router, Not a Script
- The Honest Scope Note
- What’s Next
The Good Problem
The living plan worked so well that I kept feeding it. Every architectural call went in as a numbered decision, sessions were appended a log entries, and half-formed “we should maybe…” landed in a parking lot instead of me forgetting it.
Then one morning I ran wc -w on the PLAN.md for content-tools-v2, my content pipeline project, and it told me 28,357 words.
Twenty-eight thousand words. That’s a novella. Which means every session I was spending context budget and my own attention re-skimming decisions D1 through D18, none of which had changed in over a month. D5 got decided in week two and never reopened. Why was I carrying its full body into every session four months later?
The decisions I’m actively wrestling with this week are maybe 20% of the document. The other 80% is settled history. Important history but it doesn’t need to be in my face every session. It needs to be findable, not present.
Progressive Disclosure, Which I Was Already Doing Everywhere Else
It’s how I work with CLAUDE.md in my projects. It doesn’t inline the architecture and the quickstart and the plan. It points at them: “see QUICKSTART.md, see PLAN.md.” The conductor file stays small; the heavy docs are one hop away.
I was using progressive disclosure everywhere except the one document that needed it most. Keep the hot layer (live decisions, open questions, recent sessions) right there in PLAN.md. Move the cold layer out to a docs/ tree, and leave a one-line pointer behind.
A collapsed decision still shows its heading. You still see that it exists and it links straight to the full body. Nothing is hidden.

Quick naming note, because I’m about to use one word a lot. I call this compacting the plan. When I first built the skill I named the operation “rebalancing”, and that name is now baked in. It’s the trigger word, it’s the name of the log file. So “rebalance” is going to keep showing up in this post whether I like it or not. But it’s the wrong word. I’m not balancing two things against each other toward some equilibrium. I’m compacting the hot doc and paging the cold half out to linked files on disk. So: compacting. (The skill keeps its dumb name until post 3, where it finally earns a better one.)
What “Cooled” Actually Means
This is the part that took real thought, because “just move the old stuff” is not a rule a tool can follow. Not everything cools the same way, and the heuristic has to be specific enough that I can hand it to an agent and trust the result. Here’s what I landed on:
Session logs cool by recency. Keep the last few inline (I keep three), archive everything older into docs/sessions/. This is almost always the heaviest cold mass in the whole document and it’s the safest thing to move, because a six-week-old session log is pure archive. Nobody’s making decisions off it.
Decisions stay whole while they’re hot, and collapse to a heading-plus-link when they cool. A decision is “hot” if it’s recent, still being referenced, or actively shaping current work. It cools when it’s settled, built, and nothing live points at it anymore. The exception: anything marked 🔒 foundational stays whole no matter how old it is, because the whole architecture leans on it. In content-tools that’s D23, the composable-pipeline decision the entire engine is built on. That one’s never getting collapsed.
Questions and parking-lot items are born as pointers. This was the cleanest insight. A parking-lot item doesn’t need to be classified later as hot or cold. The heading is the item. Only the question itself lives in PLAN.md. The discussion, if there is any, lives in a file in docs/. There’s no re-classification step because it started in the right shape.
The before-and-after looks like this. A hot decision sits in PLAN.md with its full body:
### D19 — Slot-3 stage reshape: deduplicator-as-gate, moved to slot 2,
contingent on empirical templating
[...a few hundred words of reasoning, tradeoffs, and what it depends on...]
A cooled one collapses to a single line that still tells you what it is and where to read it:
### D5 — Per-workspace knowledge base: curated wiki built from a raw dump → [full text](docs/decisions/D05.md)
You lose nothing. You can still see D5 exists and you’re one click from the full doc. You just stopped paying for it in every session.
The Payoff: 28,357 Words Down to 8,332
I ran the first real compaction on content-tools-v2 on June 19th. The PLAN.md went from 28,357 words to 8,332. Roughly a 70% cut, and not a single byte of history was deleted. It was relocated.
Here’s the docs/ tree it produced:
docs/
├── decisions/
│ ├── D01.md
│ ├── D02.md
│ ├── ...
│ └── D18.md
├── sessions/
│ └── sessions-01-18.md
├── questions/
├── parking-lot/
├── futures.md
└── reference-prior-artifacts.md
Eighteen settled decisions filed one-per-file. Sessions 1 through 18 collapsed into a single archive. Resolved questions, speculative futures, and the pile of “prior planning artifacts” reference material all moved out.
What’s left in PLAN.md is the hot layer only: the live decisions D19 through D24 with their full bodies, the open questions, and the active parking lot. Everything that’s settled is exactly one link away.
If the doc is the deliverable (which was the whole thesis of post 1), then this isn’t busywork. This is just refactoring the deliverable. You don’t delete the git history when you clean up a codebase. You file it.
It’s a Process
Now, I could have stopped there. One afternoon and done. But I didn’t want to redo by hand every month because the thing keeps growing. The plan is a living document. It got fat once; it’ll get fat again. A one-off fix for a recurring problem is just a chore you’ve scheduled for future-you.
So I built it as a repeatable, checked-off process with two triggers.
The first trigger is a phrase. I say “rebalance” (the skill’s word, not mine) and it runs. Simple.
The second is a size nudge. When the skill gets invoked, it checks the word count, and if PLAN.md is over a threshold (default 15,000 words) it surfaces it: “PLAN.md is 27.6k words, ~12k over threshold, want to rebalance?” The key detail here, and the one I had to learn the hard way, is that the threshold is measured by wc -w PLAN.md, not by live context percentage. The agent cannot reliably read its own context usage.
And then there’s a ledger. Every run writes to docs/rebalance-log.md (the file’s named for the old word too), recording each hot/cold call and the reason behind it. Here’s the actual entry from that first run:

### 2026-06-19 — rebalance (PLAN.md 28357 → 8332 words)
First run; link graph not yet built, so all calls were manual read-through.
- sessions 1–18 | archived → docs/sessions/sessions-01-18.md | recency, kept last 3 (19–21)
- D1–D18 (bodies) | cooled → docs/decisions/D01–D18.md | settled/built; nothing active references them
- D19–D22 | kept | recent, conservative call (keep-recent over cooling all of D1–D22)
- D23 | kept | 🔒 foundational (novel pipeline / D23 step-3 depends on it)
- D24 | kept | active (linking work, sessions 18–21)
- Q6, Q7 | archived → docs/questions/ | resolved (Q7→D20, Q6 closed)
- Q8 | kept | open (D24 is its first capability)
Why bother logging the reasons? Because the plan is that once a reason recurs often enough, it gets promoted to an automatic rule. The first few runs are pure per-run judgment. But “sessions older than the last three always archive” is a pattern that shows up every single time, so eventually that stops being a judgment call and becomes a rule the skill just applies. It’s the skill-hardener pattern, except pointed at my own process: per-run judgment now, automation later, and the ledger is what bridges the two.
Why It Became a Skill: The Other Project Running This
The thing that pushed this from “a tidy script for one project” to “a reusable skill” was realizing I had the exact same disease on another repo. And that it was a slightly different flavor of it.
My other live project is a knowledge-graph site I mentioned in the update section of post 1. It had a living PLAN.md too, around 14,700 words, but it had drifted off the rails of the workflow itself: a plan, yes, but no implementer agent and no docs/ tree to cool anything into. content-tools just needed a diet. This project needed the diet and a couple of missing organs put back first.
So the deliverable couldn’t be a one-off manual reorg of one repo. It had to be a reusable convention I could point at any project, including one that wasn’t even fully set up yet. That’s the plan-rebalance skill. It lives in my skillshare directory and syncs out to all my tools, so it’s available wherever I’m working. I ran it across both projects the same day, June 19th. On content-tools it did a straight rebalance. On the other project it did an adopt-plus-rebalance in one pass: created the implementer agent, scaffolded the docs/ tree, reconciled the conductor file, then trimmed the plan from 14,700 down to about 11,000 words.
But here’s the catch that made portability non-optional: my two projects don’t even agree on what their own files are called.
| content-tools-v2 | nsf-scifi-wiki | |
|---|---|---|
| Task file | TASK.md | TASKS.md (plural) |
| Conductor file | CLAUDE.md | AGENTS.md |
| Wrinkle | none | CLAUDE.md is a symlink to AGENTS.md |
If the skill had assumed TASK.md and CLAUDE.md, it would have written through the symlink instead of the real file and missed the task handoff entirely. So the skill discovers names, it never assumes them. It globs for the planning doc, tolerates singular-or-plural task files, and resolves the conductor symlink with readlink so it edits AGENTS.md directly instead of writing through the link. It even records what it found in that project’s own rebalance log, so the next run doesn’t have to re-derive any of it:
## Config
- Planning doc: PLAN.md
- Task handoff: TASKS.md (plural — project convention)
- Conductor: AGENTS.md (CLAUDE.md is a symlink → AGENTS.md; edit the real target)
- Implementer: implementer (.claude/agents/implementer.md, model: sonnet)
- Word threshold: 15000
The Skill Is a Router, Not a Script
The other design constraint came straight out of post 1’s philosophy: the planning agent has to stay light. Its job is to plan and orchestrate, nothing else. If I stuff a thousand words of compaction logic into PLAN.md or CLAUDE.md, I’ve just re-bloated the exact files I’m trying to keep lean. That’d be self-defeating.
So the skill is a router. The heavy process logic is lazy-loaded from the skill’s own workflows/ and references/ files. The project only ever gets a one-line pointer that says “invoke this skill,” never a copy of the process. The skill’s own description sums up its job: it “owns the planning-doc workflow.” The project doesn’t have to know how compaction works. It just has to know who to call.
When you invoke it, the first thing it does is detect what state the project is in and route accordingly:
| State | What it means | Route |
|---|---|---|
| Greenfield | no planning doc at all | scaffold the whole workflow: PLAN/TASK/implementer agent/docs tree |
| Partial / adopt | a plan exists but pieces are missing | backfill only what’s missing, idempotently |
| Mature | full system, heavy doc | rebalance (the compaction pass) |

content-tools was the Mature case. It had the whole post-1 system already and just needed the diet. The other project was the Partial case: a plan, but no implementer agent and no docs tree, so it needed a backfill and a rebalance, which is exactly what it got, in a single run. A project can need more than one of these, and the skill reports what it found and lets me pick the actions; it doesn’t silently chain them.
And every mutating action is propose-then-confirm. It shows me a manifest of what’s moving where before it touches anything. It never clobbers an existing file. I diffed the moved decision files against HEAD to confirm nothing got “helpfully” reworded on the way out. When a tool is rearranging the document that is my deliverable, I want a receipt.
The Honest Scope Note
Every post in this series gets one of these, so here’s this one’s: none of this matters until the living plan has already won.
This is a good-problem-to-have tax. You only hit the 28,000-word wall because the plan worked well enough that you kept feeding it for weeks. If you build the compactor before you have a plan worth compacting, congratulations, you’ve just reinvented spec-kit ceremony in a different hat, which is the exact thing the entire first post was an argument against. Don’t do that. Get the plan working first. Let it get fat. Then put it on a diet.
There’s no clever algorithm here. It’s “move the old stuff to a folder and leave a link.” But the alternative is the plan slowly turning into the document you dread opening, and the day you start avoiding your own source of truth is the day the whole method quietly dies. The boring maintenance layer is what keeps the interesting part alive.
What’s Next
So the plan stays lean now. The hot layer is hot, the cold layer is filed, and the compactor keeps it that way on a trigger instead of on my willpower.
Except (and this is where post 3 picks up) keeping the plan exposed a completely different bottleneck on the build side. My setup hands one task at a time to a background Sonnet agent and and then I try to plan in the gap. But the gap was not big enough. When your plan is finally clean enough to generate work faster than your builder consumes it, the new question becomes: how do you keep the builder always busy?
That’s the next experiment: the batch throughput problem, and this same skill growing to own that part of the workflow too. (And, if I’m lucky, finally getting a name I can actually remember and makes sense.)
