There's a conversation that happens in almost every engineering organization that adopts feature flags. It goes something like this:
Engineering Manager: "We have 300 feature flags and half of them are stale. We need to clean them up."
Team Lead: "We're using LaunchDarkly. Can't it handle that?"
Engineering Manager: "It tells us the flags exist. It doesn't remove them from our code."
Team Lead: "...wait, then what exactly are we paying for?"
This exchange reveals a fundamental misunderstanding that costs organizations millions of dollars in accumulated technical debt. Feature flag management and feature flag cleanup are two different disciplines that solve two different problems. Most teams invest heavily in one and completely neglect the other, then wonder why their flag debt keeps growing despite having "a feature flag tool."
Understanding the distinction -- and why you need both -- is the first step toward actual flag lifecycle management rather than just flag lifecycle accumulation.
The flag lifecycle has two halves
Every feature flag has a lifecycle that looks roughly like this:
- Creation: A flag is defined in a management platform with targeting rules, default values, and segments
- Implementation: A developer writes code that evaluates the flag and branches behavior accordingly
- Rollout: The flag is gradually enabled for increasing percentages of users
- Completion: The rollout reaches 100% (or the experiment concludes, or the feature is killed)
- Cleanup: The flag is archived in the management platform AND the conditional code is removed from the codebase
- Verification: The codebase is confirmed to be free of references to the retired flag
Steps 1-4 are the domain of flag management platforms. Steps 5-6 are the domain of flag cleanup tools. The problem is that most teams treat steps 1-4 as the entire lifecycle and treat steps 5-6 as an afterthought -- or don't treat them at all.
The result: flags that are "done" in the management platform but still alive in the codebase, creating technical debt that compounds silently.
What flag management platforms actually do
Flag management platforms like LaunchDarkly, Split.io, Unleash, DevCycle, and Statsig are sophisticated systems that handle a critical set of responsibilities:
Runtime flag evaluation
The core function. When your application code calls ldclient.BoolVariation("new-checkout", user, false), the management platform evaluates the flag based on targeting rules, user segments, percentage rollouts, and prerequisites, then returns the appropriate value. This happens millions of times per second across the industry.
Targeting and segmentation
Management platforms maintain complex rule sets that determine which users see which flag values. You can target by user attributes, geographic regions, device types, account plans, or custom segments. This is genuinely powerful functionality that enables progressive rollouts, beta programs, and personalization.
Experimentation and analytics
Many platforms include A/B testing capabilities, measuring the impact of flag variations on product metrics. Statsig, Split, and LaunchDarkly all offer experiment analysis that helps teams make data-driven decisions about which variation to ship.
Operational controls
Kill switches, circuit breakers, and emergency toggles are managed through these platforms. The ability to instantly disable a feature across all users is one of the most compelling reasons to adopt feature flags in the first place.
Audit and compliance
Enterprise platforms maintain detailed audit logs of who changed what flag, when, and why. This is critical for regulated industries and security-conscious organizations.
All of this is valuable. None of it removes a single line of code from your codebase.
What flag management platforms don't do
Here's what happens when a feature rollout completes successfully and you archive a flag in your management platform:
- The flag disappears from the management dashboard (or moves to an archived state)
- The management platform stops serving evaluations for that flag
- Your code continues to contain every
if/elsebranch, everyBoolVariationcall, every conditional import, and every test case related to that flag
The flag is "done" according to the management platform. But in your codebase, the dead code remains:
// This code exists in production right now, for a flag archived 8 months ago
treatmentEnabled, _ := ldClient.BoolVariation("new-checkout-flow", user, false)
if treatmentEnabled {
return processNewCheckout(ctx, order)
} else {
// This branch hasn't been reachable for 8 months
// but it still gets compiled, still gets tested,
// and still confuses every developer who reads this file
return processLegacyCheckout(ctx, order)
}
This dead code has real costs:
- Cognitive load: Every developer who reads this file must understand both branches and figure out which one is actually active
- Testing overhead: Test suites must cover both branches even though one is permanently unreachable
- Refactoring friction: Any change to the checkout flow must account for both code paths
- Build time: Dead code is still compiled and bundled
- Security surface: The legacy code path may contain vulnerabilities that would be eliminated by removal
Some management platforms offer features that help identify candidates for cleanup:
- LaunchDarkly Code References scans your codebase for flag keys and shows where they appear in the dashboard
- Unleash staleness markers flag flags that have exceeded their configured lifetime
- DevCycle Code Usages tracks where flags are evaluated through SDK telemetry
These features are genuinely useful for visibility. But they all stop at identification. None of them generate the pull request that removes processLegacyCheckout, eliminates the if/else branch, cleans up the unused import, and updates the test suite. That's the "last mile" problem.
The last mile problem
In logistics, the "last mile" is the final leg of delivery -- getting a package from the distribution center to the customer's door. It's consistently the most expensive and complex part of the supply chain, accounting for up to 53% of total delivery costs.
Feature flag cleanup has its own last mile problem. Identifying that a flag is stale is the easy part. Actually removing the flag code is where the real work lives:
What "removing a flag" actually requires
Consider a feature flag that controls a new pricing page. Removing it means:
-
Find every reference to the flag key across the entire codebase (not just the obvious evaluation call, but also test files, configuration, documentation, and feature flag wrappers)
-
Determine the resolved value -- should the code path for
trueorfalsebe kept? This requires understanding the flag's current state in the management platform. -
Simplify the conditional logic -- remove the
if/elsebranch and keep only the active code path -
Eliminate dead code -- the unused branch, any functions called only from that branch, any imports used only by that branch
-
Update tests -- remove test cases for the eliminated branch, simplify test setup that configured the flag
-
Handle cascading changes -- if other flags depend on this flag, or if this flag is nested within other conditionals, the removal may trigger a cascade of simplifications
-
Verify nothing breaks -- the removed code path must genuinely be unreachable, and the remaining code must function correctly
This is skilled engineering work. It requires understanding the code, the flag's purpose, and the relationships between components. For a simple flag, it might take 30 minutes. For a deeply embedded flag with cross-cutting concerns, it can take hours.
Multiply that by the 100-150 stale flags in a typical enterprise codebase, and the cleanup backlog represents weeks or months of engineering work.
This is why the cleanup never happens. It's important, but it's never urgent (until it is). It's tedious. It's risky -- what if you remove the wrong branch? And it's invisible to stakeholders -- nobody celebrates a PR that removes dead code.
The gap between management and cleanup
The result of this last mile problem is a growing gap between what the management platform shows and what the codebase actually contains.
| What the management platform says | What the codebase actually contains |
|---|---|
| 50 active flags | 50 active flags + 150 archived flags still in code |
| Clean flag inventory | 200+ conditional branches |
| Healthy flag lifecycle | No lifecycle tracking for code-level changes |
| Flag archived successfully | Dead code still shipping to production |
This gap grows over time. Every month, more flags complete their rollout and get archived in the management platform. Every month, those flags remain in the codebase. The management dashboard looks clean. The codebase looks increasingly like a maze of dead conditionals.
The cruelest irony: teams that use feature flags most effectively (frequent rollouts, many experiments, rapid iteration) accumulate the most flag debt because they create and archive flags at a higher rate.
What the numbers look like
Consider a team that creates 10 flags per month and successfully archives them in their management platform after rollout:
| Month | Flags Created (Cumulative) | Flags Archived in Platform | Stale Flags in Code | Code Debt |
|---|---|---|---|---|
| 3 | 30 | 15 | 15 | Low |
| 6 | 60 | 40 | 40 | Growing |
| 12 | 120 | 90 | 90 | Significant |
| 18 | 180 | 140 | 140 | Critical |
| 24 | 240 | 190 | 190 | Overwhelming |
By month 24, the management platform shows 50 active flags. The codebase contains conditional logic for 240 flags, 190 of which are dead. Every developer navigates this complexity daily.
Why you need both: The complete lifecycle stack
Complete flag lifecycle management requires tools that cover both halves of the lifecycle:
The management half (creation through rollout)
A flag management platform handles:
- Flag creation with targeting rules and segments
- Runtime evaluation at scale (milliseconds, millions of requests)
- Progressive rollout with percentage-based targeting
- Experimentation with statistical analysis
- Operational controls (kill switches, circuit breakers)
- Audit logging and compliance
Without a management platform, you're building flag evaluation logic from scratch, managing targeting in configuration files, and losing the ability to change flag states without deploying code.
The cleanup half (completion through verification)
A flag cleanup tool handles:
- Detecting flags in source code across multiple languages
- Tracking when flags were introduced and how they've changed over time
- Identifying flags that have become stale (at 100% rollout, archived in platform, unchanged for extended periods)
- Generating code changes that remove the flag evaluation, eliminate dead branches, and clean up related artifacts
- Creating pull requests for team review
- Verifying that all references have been removed
Without a cleanup tool, you're relying on manual code searches, quarterly cleanup sprints, and developer discipline to remove dead code -- none of which scale past a handful of flags.
How they work together
The ideal workflow integrates both halves:
- Developer creates a flag in the management platform with targeting rules
- Developer writes code that evaluates the flag and branches behavior
- Cleanup tool detects the new flag in the PR and begins lifecycle tracking
- Management platform handles rollout -- gradual percentage increase, monitoring, experimentation
- Rollout completes -- flag reaches 100% or experiment concludes
- Developer archives the flag in the management platform
- Cleanup tool detects staleness -- the flag has been at 100% for X days, or has been archived
- Cleanup tool generates a PR that removes the flag code, eliminates dead branches, and cleans up tests
- Team reviews and merges the cleanup PR
- Lifecycle complete -- the flag is gone from both the platform and the codebase
This workflow ensures no flag falls through the cracks between the management platform and the codebase. The management platform does what it does best (runtime evaluation, targeting, experimentation), and the cleanup tool does what it does best (code detection, lifecycle tracking, automated removal).
Common objections and why they don't hold up
"Our management platform has code references / staleness features"
This is the most common objection, and it's worth addressing directly. Yes, LaunchDarkly has Code References. Yes, Unleash marks stale flags. Yes, DevCycle tracks code usages. These features are valuable for visibility, but they don't solve the cleanup problem.
Code References tells you that checkout.go:47 contains a reference to new-checkout-flow. It doesn't generate the PR that removes lines 45-52, eliminates the processLegacyCheckout function, and updates the test file. That's the hard part.
Think of it this way: your management platform is like a GPS that can tell you where every piece of trash is on the highway. Useful information. But you still need someone (or something) to actually pick up the trash. Visibility without automation is just a more detailed view of the problem.
"We can handle cleanup manually"
Some teams genuinely can -- for a while. If you create fewer than 5 flags per month and have strong engineering discipline, manual cleanup is feasible.
But manual cleanup breaks down at scale for predictable reasons:
- Prioritization: Cleanup tickets consistently lose to feature work in sprint planning
- Context loss: The developer who created the flag may have left the team or the company
- Risk aversion: Nobody wants to be the person who broke production by removing the wrong code path
- Invisibility: There's no stakeholder celebrating a "removed 3 stale flags" PR
In our experience, teams relying on manual cleanup accumulate stale flags at a rate that outpaces removal significantly. The backlog grows until it's so large that a "cleanup sprint" can't make a meaningful dent.
"We'll build our own cleanup tooling"
This is a reasonable impulse, especially for teams with strong internal tooling cultures. But building effective flag cleanup tooling is harder than it appears:
- Multi-language support: Most codebases aren't monolingual. Supporting Go, TypeScript, Python, and Java requires separate AST parsers for each.
- Dead branch elimination: It's not enough to remove the flag evaluation call. You need to keep the correct branch, eliminate the dead branch, and handle the cascade of unused functions and imports.
- Lifecycle tracking: Point-in-time scans tell you what flags exist now. Lifecycle tracking tells you when they were introduced, how long they've been stale, and whether they're trending toward cleanup or away from it.
- Maintenance: Your custom tool needs to evolve as your codebase, SDKs, and patterns change.
Teams that build custom cleanup tooling typically spend 2-4 engineering months on the initial version and 4-8 hours per month on maintenance. Tools like FlagShark and Piranha exist specifically because this problem is common enough to justify dedicated solutions.
"Our flag management platform will add cleanup features eventually"
Perhaps. But the incentive structures don't align. Flag management platforms make money when you create more flags. Their core competency is runtime evaluation, targeting, and experimentation. Code-level refactoring is a fundamentally different technical problem that requires different expertise (AST parsing, code transformation, multi-language support).
Some platforms may add basic cleanup automation over time. But the depth of code understanding required for safe, automated flag removal is a hard problem that benefits from dedicated focus. This is why tools like Piranha (built by Uber specifically for this problem) and FlagShark (built as a purpose-focused SaaS) exist as separate tools rather than features within existing platforms.
The cost of the gap
The gap between flag management and flag cleanup has a real, observable cost. Based on what we have seen across engineering teams:
- Stale flags accumulate rapidly when only management tooling is in place. Teams without cleanup practices routinely carry 90+ stale flags after a year, while teams with active cleanup keep the number well under 20.
- Developer productivity suffers from navigating dead code paths, reviewing stale conditionals, and maintaining tests for unreachable branches.
- Code reviews take longer when reviewers must reason about multiple code paths controlled by flags that are permanently on.
- Incident resolution is slower when debugging requires understanding flag states across the codebase.
- New hire onboarding takes longer in codebases cluttered with flags whose purpose is unclear.
The difference between teams with and without cleanup practices is stark. Teams that invest in cleanup tooling alongside their management platform see dramatically lower flag debt, faster development cycles, and shorter onboarding times. The investment in a cleanup tool pays for itself through recovered engineering productivity.
Building a complete flag lifecycle strategy
If you're convinced that both halves matter, here's how to implement a complete lifecycle strategy.
Step 1: Audit your current state
Before adding tools, understand the scope of the problem:
- Count your flags: How many flags exist in your management platform? How many are archived?
- Scan your code: How many flag references exist in your codebase? How many reference archived flags?
- Calculate the gap: The difference between archived flags and code references is your cleanup debt.
For most teams, this audit is eye-opening. The gap is usually much larger than anyone expected.
Step 2: Choose your management platform (if you don't have one)
If you're not already using a flag management platform, choose one based on your team's needs:
| Need | Recommended Platform |
|---|---|
| Enterprise targeting + experimentation | LaunchDarkly or Split.io |
| Open-source / self-hosted | Unleash |
| Developer-first experience | DevCycle |
| Analytics-driven experimentation | Statsig |
| Budget-conscious teams | Unleash (OSS) or DevCycle (free tier) |
Step 3: Add a cleanup tool
This is the piece most teams are missing. Your cleanup tool should:
- Detect flags in your source code across all languages your team uses
- Track flag lifecycle from introduction through removal
- Identify stale flags based on configurable criteria (age, rollout status, evaluation frequency)
- Generate cleanup PRs that safely remove dead flag code
- Integrate into your workflow so cleanup happens as part of normal development, not as a separate initiative
Tools like FlagShark provide this as a managed service with zero-config setup. Piranha provides it as an open-source engine that you configure and host yourself. Either approach closes the lifecycle gap -- choose based on your team's constraints.
Step 4: Connect the two halves
The highest-value integration is connecting your management platform's flag status with your cleanup tool's code tracking:
- When a flag is archived in the management platform, the cleanup tool should know immediately
- When a flag's code references are removed, the management platform should archive the flag
- Staleness criteria should incorporate data from both systems (time since archive + time since code change + evaluation frequency)
This bidirectional connection ensures no flag falls into the gap between systems.
Step 5: Establish process guardrails
Tools alone aren't sufficient. Embed flag lifecycle management into your team's processes:
- Definition of done includes flag cleanup plan (not just flag creation)
- Flag review as a standing agenda item in sprint retrospectives
- Cleanup SLAs: flags must be removed within X days of archival
- Metrics: track flag debt alongside other engineering health metrics
Feature flag management platforms are essential tools for modern software development. They enable safe rollouts, powerful experimentation, and instant operational controls. But they solve only half the problem. Without a complementary cleanup strategy, every flag you create becomes permanent technical debt the moment its rollout completes.
The teams that thrive with feature flags are the ones that invest in the full lifecycle -- from creation through removal. The management platform handles the first half. A cleanup tool handles the second. Together, they transform feature flags from a growing liability into a sustainable advantage. The gap between management and cleanup is where flag debt lives. Close the gap, and the debt stops accumulating.