Every trunk-based development guide tells the same story: stop using long-lived feature branches, merge to main frequently, use feature flags to hide incomplete work, and ship continuously. It is excellent advice. Trunk-based development (TBD) is the highest-performing branching strategy according to DORA research, and feature flags are the mechanism that makes it practical.
What none of these guides address is what happens six months later, when your main branch is littered with 150 flags from features that shipped weeks or months ago and nobody has cleaned up.
This is the dirty secret of TBD adoption: the same practice that eliminates merge conflicts and enables continuous delivery creates a new category of technical debt that accumulates silently and compounds aggressively. Teams trade branch hell for flag hell, and most do not realize it until the codebase is thoroughly infected.
This post covers why trunk-based development amplifies flag debt, how to build cleanup into TBD workflows, and what a sustainable flag lifecycle policy looks like for teams that ship to main every day.
How TBD amplifies the flag debt problem
Trunk-based development and feature flags are deeply coupled. You cannot practice TBD without flags (or a comparable mechanism), because incomplete work on main must be hidden from users. This coupling creates a structural incentive to create more flags than any other development workflow.
The math: TBD creates more flags, faster
TBD teams tend to create significantly more flags than branch-based teams because every feature, no matter how small, gets a flag. In a branch-based workflow, a three-day feature lives on a branch and gets merged when complete -- no flag needed. In TBD, that same feature gets a flag on day one, the code merges to main incrementally over three days, and the flag remains after the feature ships.
The intended lifetime of a TBD flag is short -- often just days. But the actual lifetime stretches to months because no automated mechanism removes flags after rollout. The gap between intended and actual lifetime is where debt accumulates.
Why TBD teams are worse at cleanup
Paradoxically, TBD teams are often worse at flag cleanup than branch-based teams, despite having more flags to manage. Three structural factors explain this:
1. Volume overwhelms manual processes. A team creating 25 flags per month cannot realistically track and clean up each one manually. Even a biweekly "flag cleanup" ritual addresses only a fraction of the backlog. The math does not work: if cleanup takes 30-60 minutes per flag (including code changes, testing, and review), 25 flags per month requires 12-25 hours of dedicated cleanup time -- more than half an engineer's working week.
2. Short-lived flags feel unimportant. Because TBD flags are intended to be temporary, they receive less documentation, less ownership assignment, and less lifecycle planning than flags in slower-moving workflows. A team using GitFlow might create a flag for a quarterly release and treat it as a significant artifact. A TBD team creates a flag for a two-day change and treats it as disposable scaffolding -- which is correct in theory but disastrous in practice when the scaffolding never gets torn down.
3. High velocity deprioritizes maintenance. TBD teams ship fast. That velocity creates constant forward pressure: there is always another feature to ship, another PR to merge, another rollout to monitor. Cleanup is perpetually "next sprint" work that never arrives because next sprint has its own forward-looking priorities.
The compounding effect
These factors create a compounding cycle:
- TBD team ships feature, creates flag
- Feature rolls out successfully, team moves on
- Flag remains, joins growing backlog of stale flags
- Growing flag count makes codebase harder to navigate
- Harder codebase slows development, creating pressure to ship faster
- Team ships faster, creating more flags, with less time for cleanup
- Return to step 3
This cycle is why TBD teams often have the highest flag counts and the highest stale percentages in organizations that practice multiple branching strategies across teams.
The anatomy of a TBD flag lifecycle
Understanding the intended lifecycle of a TBD flag reveals exactly where cleanup fails and where intervention is needed.
The ideal lifecycle
Day 0: Flag created, code begins merging behind flag
Day 1-5: Incremental merges to main, all behind flag
Day 5-7: Feature complete, flag enabled for internal testing
Day 7-10: Gradual rollout (10% → 50% → 100%)
Day 10-14: Monitoring period, flag at 100%
Day 14: Flag removed from code and flag management platform
Total lifecycle: ~14 days. This is realistic for a well-functioning TBD team with strong cleanup discipline.
What actually happens
Day 0: Flag created with a meaningful name
Day 1-5: Incremental merges to main, all behind flag
Day 5-7: Feature complete, flag enabled for internal testing
Day 7-10: Gradual rollout to 100%
Day 10-14: Monitoring period, everything looks good
Day 15: Team starts next feature, new flag created
Day 30: Original flag still at 100%, "we'll clean it up soon"
Day 60: Engineer mentions the flag in a code review, everyone agrees to remove it "next sprint"
Day 90: Flag is now "legacy," new hires assume it is load-bearing
Day 180: Flag has been 100% enabled for 6 months; removing it feels risky
Day 365: Flag is permanent infrastructure
The failure point is almost always Day 14 -- the moment when the flag has served its purpose and should be removed, but the team has already moved on. This is not a discipline failure; it is a workflow failure. The TBD workflow has no built-in mechanism for triggering cleanup.
Building cleanup into TBD workflows
The solution is not "be more disciplined about cleanup." That approach fails repeatedly because it relies on human memory and prioritization, both of which are unreliable when competing with the forward pressure of feature work. The solution is to make cleanup a structural part of the TBD workflow -- as automatic and unavoidable as running tests.
Strategy 1: Flag creation requires a cleanup ticket
Every PR that introduces a new feature flag must include a linked cleanup ticket with a due date. The cleanup ticket is created at the same time as the flag, not after the feature ships.
Implementation:
# Example: CI check that enforces flag-cleanup ticket linkage
# .github/workflows/flag-check.yml
name: Flag Cleanup Check
on: [pull_request]
jobs:
check-flags:
runs-on: ubuntu-latest
steps:
- name: Check for new flags
run: |
# Detect new flag evaluations in the diff
NEW_FLAGS=$(git diff origin/main... --name-only | \
xargs grep -l "isEnabled\|getFlag\|useFeatureFlag" 2>/dev/null || true)
if [ -n "$NEW_FLAGS" ]; then
# Verify PR description contains a cleanup ticket reference
if ! echo "$PR_BODY" | grep -qE "(JIRA|LINEAR)-[0-9]+.*cleanup"; then
echo "::error::New feature flags detected. PR must include a cleanup ticket."
exit 1
fi
fi
This approach works because it shifts the cost of cleanup planning to the moment of creation -- when the engineer has full context and the work is trivial. Creating a cleanup ticket takes 60 seconds at creation time but requires 30+ minutes of context reconstruction if deferred.
Strategy 2: Expiration dates as code
Embed flag expiration metadata directly in the code, not just in your flag management platform. When the expiration date passes, the build warns or fails.
Implementation example (TypeScript):
// flags.ts - Flag registry with expiration enforcement
interface FlagDefinition {
name: string;
owner: string;
createdAt: string;
expiresAt: string;
type: 'release' | 'experiment' | 'operational';
}
const FLAG_REGISTRY: FlagDefinition[] = [
{
name: 'new-checkout-flow',
owner: 'checkout-team',
createdAt: '2026-01-15',
expiresAt: '2026-02-15',
type: 'release',
},
{
name: 'pricing-experiment-q1',
owner: 'growth-team',
createdAt: '2026-01-10',
expiresAt: '2026-02-10',
type: 'experiment',
},
];
// CI/build-time check
function checkExpiredFlags(): string[] {
const now = new Date();
return FLAG_REGISTRY
.filter(flag => flag.type !== 'operational')
.filter(flag => new Date(flag.expiresAt) < now)
.map(flag => `Flag "${flag.name}" (owned by ${flag.owner}) expired on ${flag.expiresAt}`);
}
Go implementation:
// flags.go - Compile-time flag expiration
package flags
import "time"
type FlagMeta struct {
Name string
Owner string
ExpiresAt time.Time
Type string // "release", "experiment", "operational"
}
var Registry = []FlagMeta{
{
Name: "new-checkout-flow",
Owner: "checkout-team",
ExpiresAt: time.Date(2026, 2, 15, 0, 0, 0, 0, time.UTC),
Type: "release",
},
}
// CheckExpired returns all non-operational flags past their expiration date.
func CheckExpired() []FlagMeta {
var expired []FlagMeta
now := time.Now()
for _, f := range Registry {
if f.Type != "operational" && f.ExpiresAt.Before(now) {
expired = append(expired, f)
}
}
return expired
}
The key insight: expiration dates in a flag management platform are advisory. Expiration dates in CI/CD are enforcement. In our experience, teams that embed expirations in code have significantly lower stale flag percentages than those relying on platform-side reminders alone.
Strategy 3: Automated cleanup PRs
The highest-leverage intervention is automating the cleanup PR itself. When a flag has been at 100% for a configurable period (typically 7-14 days for release flags), an automated system generates a PR that removes the flag evaluation, eliminates dead code branches, and cleans up associated test fixtures.
This is precisely what tools like FlagShark do: monitor your repositories, track flag lifecycle through PR analysis, detect when flags become stale, and generate cleanup PRs with the mechanical code changes already done. The engineer's job shifts from "find and remove stale flags" (30-60 minutes of tedious, error-prone work) to "review and merge a cleanup PR" (5-10 minutes of focused review).
Why automated PRs work for TBD teams specifically: TBD teams are already PR-centric. Every change goes through a pull request on main. Adding automated cleanup PRs to the flow requires zero workflow changes -- engineers review and merge them exactly like any other PR. The cleanup work happens in the same tool (GitHub/GitLab), with the same review process, and the same CI checks.
Why it works: Automation eliminates the two biggest barriers to cleanup: the effort of writing the PR and the inertia of starting the work. Teams using automated cleanup PRs tend to have dramatically higher cleanup ratios than teams relying on manual processes.
Strategy 4: Flag budgets per team
Set an explicit maximum number of active release and experiment flags per team. When a team hits its budget, new flags cannot be created until existing ones are cleaned up.
Recommended budgets for TBD teams:
| Team Size | Release Flag Budget | Experiment Flag Budget | Total Budget (excl. operational) |
|---|---|---|---|
| 2-4 engineers | 5 | 3 | 8 |
| 5-8 engineers | 10 | 5 | 15 |
| 9-15 engineers | 15 | 8 | 23 |
| 16+ engineers | 20 | 10 | 30 |
Enforcement: The budget check can run in CI. When a PR introduces a new flag and the team is at budget, the check fails with a message listing the team's current flags and their ages, making it immediately clear which flags should be cleaned up to make room.
This creates a natural feedback loop: the engineer who needs a new flag is incentivized to clean up an old one first. The cleanup happens as a prerequisite to forward progress, not as deferred maintenance.
Flag lifecycle policies for TBD teams
A flag lifecycle policy codifies expectations for how long flags should live, who is responsible for cleanup, and what happens when expectations are violated. TBD teams need stricter policies than branch-based teams because their flag creation rate is higher.
Recommended policy template
## Feature Flag Lifecycle Policy
### Flag Categories and Lifetimes
| Category | Maximum Lifetime | Expiration Action |
|----------|-----------------|-------------------|
| Release flag | 14 days after 100% rollout | Automated cleanup PR generated |
| Experiment flag | 7 days after experiment conclusion | Automated cleanup PR generated |
| Migration flag | 30 days after migration completion | Manual cleanup required, alert sent |
| Operational flag | No expiration | Annual review, documented re-approval |
### Ownership
- Every flag MUST have a named individual owner (not a team).
- Ownership transfers to the on-call engineer if the original owner leaves.
- The flag owner is responsible for merging the cleanup PR.
### Enforcement
- CI blocks PRs that create flags without: owner, category, expiration date.
- CI warns on PRs that modify code paths with flags older than their category lifetime.
- CI fails builds when team flag budget is exceeded.
- Weekly automated report sent to engineering managers with per-team flag health.
### Escalation
- Day 0-14: Flag within expected lifecycle. No action.
- Day 14-30: Flag past expected lifetime. Automated cleanup PR created.
- Day 30-60: Cleanup PR open but unmerged. Alert sent to team lead.
- Day 60-90: Flag significantly overdue. Escalation to engineering manager.
- Day 90+: Flag added to quarterly tech debt review agenda.
Adapting the policy to your team
The specific numbers in this policy are starting points. Calibrate them based on your team's deployment cadence and risk tolerance:
- Teams deploying multiple times daily can use shorter lifetimes (7 days for release flags) because the feedback loop is tight.
- Teams deploying weekly may need longer lifetimes (21-30 days) to account for slower rollout cycles.
- Teams in regulated industries may require longer monitoring periods before flag removal, extending lifetimes by 1-2 weeks.
The numbers matter less than the existence of explicit limits. A policy with generous deadlines that is enforced is infinitely better than a strict policy that is ignored.
CI/CD integration: Making cleanup unavoidable
The most effective TBD teams integrate flag lifecycle management directly into their CI/CD pipeline. Here is a practical integration architecture:
Pipeline integration points
PR Opened
├── Flag detection: Scan diff for new flag evaluations
│ ├── New flag found → Require metadata (owner, expiry, category)
│ ├── Check team flag budget → Block if exceeded
│ └── Post PR comment with flag summary
│
├── Stale flag check: Identify modified files containing stale flags
│ └── Warn if PR touches code paths with overdue flags
│
└── Merge to main
├── Update flag lifecycle tracking (flag created/modified/removed)
├── Trigger monitoring for flags at 100% rollout
└── Schedule cleanup PR generation at expiration
GitHub Actions example
# .github/workflows/flag-lifecycle.yml
name: Flag Lifecycle
on:
pull_request:
types: [opened, synchronize]
jobs:
flag-audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Detect flag changes
id: flags
run: |
# Count new flag evaluations in the diff
ADDED=$(git diff origin/main...HEAD --unified=0 | \
grep "^+" | grep -cE "isEnabled|getFlag|BoolVariation|useFeature" || echo 0)
REMOVED=$(git diff origin/main...HEAD --unified=0 | \
grep "^-" | grep -cE "isEnabled|getFlag|BoolVariation|useFeature" || echo 0)
echo "added=$ADDED" >> $GITHUB_OUTPUT
echo "removed=$REMOVED" >> $GITHUB_OUTPUT
- name: Post flag summary
if: steps.flags.outputs.added > 0 || steps.flags.outputs.removed > 0
uses: actions/github-script@v7
with:
script: |
const added = ${{ steps.flags.outputs.added }};
const removed = ${{ steps.flags.outputs.removed }};
const body = `### Feature Flag Summary
| Metric | Count |
|--------|-------|
| Flag evaluations added | +${added} |
| Flag evaluations removed | -${removed} |
| Net change | ${added - removed >= 0 ? '+' : ''}${added - removed} |
${added > 0 ? '⚠️ New flags detected. Ensure cleanup tickets are linked.' : ''}
${removed > 0 ? '✅ Flag cleanup detected. Thank you!' : ''}`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: body
});
This is a simplified version. Production implementations use AST parsing (tree-sitter) rather than regex to avoid false positives from comments, strings, and variable names that happen to contain flag-like patterns. Tools like FlagShark provide this detection out of the box, supporting 11 languages and 8+ flag providers with zero configuration beyond installing the GitHub App.
The virtuous cycle
When flag lifecycle management is embedded in CI/CD, it creates a virtuous cycle that specifically benefits TBD teams:
- Every PR gets a flag summary. Engineers see their flag impact on every code change.
- Cleanup PRs appear automatically. Removal happens without anyone remembering to do it.
- Flag budgets prevent accumulation. Teams cannot outrun their cleanup capacity.
- Metrics track progress. Engineering managers see cleanup ratios and stale percentages on dashboards.
- The codebase stays clean. Clean codebases enable faster development, which is why the team chose TBD in the first place.
Measuring success: TBD flag health metrics
Track these metrics monthly to assess whether your TBD flag lifecycle management is working:
| Metric | Target for TBD Teams | Red Flag |
|---|---|---|
| Cleanup ratio (flags removed / flags added) | >0.8 | <0.5 |
| Average release flag age | <21 days | >45 days |
| Stale flag percentage | <25% | >40% |
| Time from 100% rollout to flag removal | <14 days | >30 days |
| Flags per engineer | <4 | >6 |
| Automated cleanup PR merge rate | >80% | <50% |
The most telling metric for TBD teams is "time from 100% rollout to flag removal." This measures the gap between when a flag becomes stale and when it is actually cleaned up. In a well-functioning TBD workflow, this number should be under two weeks. If it is over 30 days, your cleanup process has a structural problem that discipline alone will not fix.
Common objections and responses
"We'll just do a flag cleanup sprint every quarter."
Quarterly cleanup sprints are the TBD equivalent of crash dieting -- dramatic short-term results that do not last. The math: a TBD team creating 25 flags per month accumulates ~75 flags per quarter. A cleanup sprint that removes 30-40 flags (an aggressive sprint) still leaves the team with a net increase. Worse, sprint-based cleanup is demoralizing. Engineers do not want to spend a week removing flags instead of building features.
Continuous cleanup through automated PRs is the sustainable alternative. Small, frequent removals integrated into normal workflow beat large, painful cleanups every time.
"Removing flags is too risky. What if something breaks?"
This is the most common objection, and it deserves a serious response. Flag removal is a code change, and code changes carry risk. But the risk is manageable and decreasing:
- Automated cleanup PRs run through the same CI pipeline as any other change. Tests catch regressions.
- AST-based removal (tree-sitter) is more precise than manual editing. Automated tools remove flag evaluations and dead code branches with structural understanding of the code, reducing the chance of accidental deletions.
- The risk of keeping stale flags is higher than removing them. Stale flags create confusion during incidents, slow down development, and accumulate interaction complexity. Every day a stale flag remains is a day it contributes to the next production issue.
- Start with flags at 100% for 30+ days. If a flag has been fully enabled for a month with no issues, removing it is as low-risk as code changes get.
"Our flag management platform handles lifecycle."
Flag management platforms (LaunchDarkly, Split, Unleash) are excellent at flag evaluation, targeting, and configuration. They are not designed for codebase cleanup. There is a fundamental gap: your flag platform knows which flags are defined and their rollout state, but it does not remove the if (isEnabled('flag')) statements from your code.
Some platforms offer "code references" that show where flags are used, and "stale flag" indicators based on last evaluation time. These are useful signals, but they do not close the loop. The actual cleanup -- modifying source code, removing dead branches, updating tests -- requires a tool that operates on the codebase, not the flag platform.
This is the distinction between flag management and flag lifecycle management. TBD teams need both.
"We're too busy shipping to invest in flag cleanup tooling."
This is the trap. The reason you are too busy is partially because stale flags are slowing you down. Engineers navigating flag-heavy code paths lose meaningful time each week to flag-related inefficiency -- investigating flag states, maintaining dead code paths, reviewing code they do not fully understand. Across a team, that adds up to a significant portion of engineering capacity lost to flag debt.
Investing a few hours in setting up automated flag lifecycle management (whether through FlagShark, a custom CI integration, or an open-source tool like Piranha) recovers that capacity permanently. The ROI is measured in days, not months.
The missing chapter in every TBD guide
Trunk-based development is the right approach for teams that want to ship fast and reduce integration risk. Feature flags are the right mechanism to make TBD practical. But the lifecycle management of those flags is the missing chapter in every TBD guide, blog post, and conference talk.
The playbook is straightforward:
- Acknowledge that TBD amplifies flag debt. Higher velocity means more flags, shorter intended lifetimes, and less time for manual cleanup.
- Categorize flags and set explicit lifetimes. Release flags are temporary (14 days). Operational flags are permanent. Treat them differently.
- Automate cleanup PR generation. Shift the work from "find and remove stale flags" to "review and merge cleanup PRs."
- Enforce flag budgets in CI. Prevent accumulation by requiring cleanup before new flag creation when at capacity.
- Measure cleanup ratio, not just flag count. The trend matters more than the snapshot.
The teams that get this right are the ones that fully realize the promise of trunk-based development: fast, safe, continuous delivery with a clean codebase. The teams that skip the cleanup chapter end up in a worse position than they were with long-lived branches -- trading visible merge conflicts for invisible flag debt that compounds silently until it becomes the dominant source of engineering friction.
Do not let flags become the hidden tax on your TBD workflow. Build cleanup into the system from day one, and flag debt becomes a solved problem rather than an inevitable consequence.