January 22, 2026·10 min read

How Many Feature Flags Is Too Many? A Data-Driven Answer

Industry benchmarks show the average team has 50-200 flags, but most are never removed. Here's how to know when your flag count has crossed from healthy to harmful.

Feature Flags Engineering Management Technical Debt Best Practices

You have 347 feature flags in production. Is that too many?

The question seems simple. The answer is not. Engineering managers ask this when something already feels wrong -- builds are slower, code reviews take longer, new hires stare at the codebase in confusion, and nobody can confidently explain what half the flags do. But putting a number on "too many" requires context that most advice articles skip entirely.

This post provides that context. Drawing on published data from LaunchDarkly and Unleash, public engineering blog posts, and our own experience working with engineering teams, we will establish concrete thresholds for when flag counts cross from healthy to harmful. You will walk away with benchmarks you can compare against, a self-assessment checklist, and a framework for setting limits that make sense for your team.

The short answer: it depends (but here are the numbers)

There is no universal "safe" number of feature flags, just as there is no universal "safe" amount of technical debt. The right number depends on team size, deployment cadence, codebase complexity, and -- critically -- your cleanup velocity.

That said, based on publicly available data and our own experience, we can identify clear warning thresholds. Here are the benchmarks we use:

Metric	Healthy	Caution Zone	Danger Zone
Flags per engineer	1-3	4-6	7+
Flags per 1,000 lines of code (KLOC)	0.5-1.5	1.5-3.0	3.0+
Flags per repository	5-30	30-80	80+
Stale flag percentage (>90 days, non-operational)	<20%	20-40%	40%+
Net flag growth per month	0 (balanced)	+1 to +5	+5 or more
Average flag age	<45 days	45-90 days	90+ days

The single most important metric is not the total count -- it is the ratio of flag creation to flag removal. An organization with 300 flags and a balanced creation/removal rate is healthier than one with 50 flags where nothing ever gets cleaned up.

Industry benchmarks: What the data actually says

LaunchDarkly's published data

LaunchDarkly, the largest feature flag management platform, has published data points about their customer base. Their public guidance indicates that the median customer maintains between 50 and 200 flags per project, with enterprise accounts averaging significantly higher. Their "code references" feature -- which scans codebases for flag usage -- frequently finds that a significant percentage of flags in a typical project have no remaining code references, meaning the flag exists in the management platform but has already been removed from (or was never added to) the codebase.

LaunchDarkly's own best practice guidance recommends treating flags as temporary by default and establishing expiration policies. The fact that the market leader in flag management explicitly tells customers to remove flags is telling. Even the company that profits from more flags acknowledges the debt problem.

Unleash open-source data

Unleash, the leading open-source feature flag platform, publishes usage metrics from their hosted offering. Their data shows a similar pattern: organizations create flags at approximately 3x the rate they archive or remove them. The average Unleash project accumulates 8-15 new flags per month while removing only 3-5.

Unleash introduced "potentially stale" flag detection in their platform specifically because users requested it -- a signal that flag accumulation was causing enough pain for users to ask for tooling to address it.

Patterns across the industry

Based on our experience working with engineering teams at various stages, public conference talks, and published case studies, the following patterns emerge for flag counts by organization size:

Rough estimates based on our experience:

Organization Size	Total Active Flags	Flags Per Engineer	Stale Percentage
Startup (<50 eng)	15-40	2-4	Lower but growing
Scaleup (50-200 eng)	80-200	4-6	Moderate
Mid-market (200-1K eng)	200-600	5-8	High
Enterprise (1K+ eng)	500-5,000+	4-7	Highest

The pattern is consistent: organizations at every scale are creating flags faster than they remove them, and the stale percentage increases with organizational size. But notice the enterprise "flags per engineer" number actually dips compared to mid-market. This is because large enterprises tend to have more centralized flag management and stricter governance -- but even with those controls, their stale percentage is the highest.

The three metrics that actually matter

Raw flag count is a vanity metric. It tells you something, but not enough to act on. These three derived metrics are far more diagnostic.

1. Flag density: Flags per KLOC

Flag density measures how thoroughly flags are woven into your codebase. It normalizes for codebase size, making it comparable across projects and teams.

How to calculate it: Count all feature flag evaluations in your codebase (calls to your flag SDK) and divide by thousands of lines of code.

Flag Density = Total flag evaluations / (Total lines of code / 1000)

Benchmarks:

Flag Density (per KLOC)	Assessment	Typical Scenario
0.1-0.5	Low	Small team, few flags, or very large codebase
0.5-1.5	Healthy	Active flag usage with reasonable management
1.5-3.0	Elevated	Heavy flag usage, likely some stale flags
3.0-5.0	High	Flag-heavy codebase, cleanup needed
5.0+	Critical	Flags dominating code paths, urgent cleanup required

Note the distinction between flag evaluations and unique flags. A single flag evaluated in 15 places creates more complexity (and more removal work) than 15 flags each evaluated once. Tools that use AST parsing -- like tree-sitter-based detection -- can count both, giving you a more accurate density picture than simple text search.

2. Cleanup ratio: Flags removed / Flags created

The cleanup ratio is the single best predictor of whether your flag count will become a problem. It measures organizational discipline, not just current state.

Cleanup Ratio = Flags removed this month / Flags created this month

Benchmarks:

Cleanup Ratio	Assessment	Trajectory
>1.0	Excellent	Actively reducing debt
0.8-1.0	Healthy	Roughly balanced, slight growth manageable
0.5-0.8	Concerning	Accumulating debt, intervention needed within 6 months
0.3-0.5	Poor	Significant accumulation, process/tooling gaps
<0.3	Critical	Creating 3x+ more than removing, flag graveyard forming

Based on what we have seen across codebases, the typical ratio is around 0.33 -- meaning teams create roughly three flags for every one they remove. This is how codebases end up with hundreds of stale flags despite nobody intending for it to happen.

3. Stale flag percentage

Stale flags are flags that have completed their purpose but remain in the codebase. A release flag that has been 100% enabled for six months is stale. An experiment flag from a test that concluded three months ago is stale. An operational kill switch reviewed and re-approved quarterly is not stale.

How to calculate it: Count flags older than your expiration threshold (commonly 90 days for release flags) that are not documented as long-lived operational flags. Divide by total flag count.

Stale Percentage = Stale flags / Total flags * 100

Benchmarks:

Stale Percentage	Assessment	Impact
<15%	Excellent	Minimal debt, strong lifecycle management
15-30%	Good	Some debt, manageable with periodic cleanup
30-50%	Elevated	Noticeable developer friction, code reviews slowed
50-70%	High	Significant productivity drain, onboarding impacted
70%+	Critical	Flag graveyard; major investment needed to recover

In our experience, the typical stale percentage for most organizations is well above 50%. If your number is below 30%, you are doing better than most teams we have worked with.

Why raw count is misleading

Consider two teams:

Team A: 200 flags, cleanup ratio of 0.9, average flag age of 38 days, stale percentage of 18%. They create and remove flags aggressively as part of a mature trunk-based development workflow. Flags are temporary by design and culture.

Team B: 45 flags, cleanup ratio of 0.1, average flag age of 210 days, stale percentage of 78%. They adopted flags two years ago and have never removed one. Every flag is load-bearing in production, and nobody is confident about removing any of them.

Team A has 4x the flags but dramatically better flag health. Team B has a small number of flags but is sitting on a minefield. This is why "how many is too many" requires context beyond the count.

The compounding cost of excess flags

Flag debt compounds in ways that are not immediately obvious. Each additional stale flag does not add cost linearly -- it adds cost geometrically because of interactions between flags.

The testing combinatorial problem

Every boolean flag doubles the theoretical state space of your application. In practice, teams do not test every combination, but they do need to reason about them during code reviews, debugging, and incident response.

Active Flags	Theoretical Combinations	Realistic Test Paths	Review Complexity
10	1,024	20-30	Manageable
25	33+ million	50-100	Requires discipline
50	1.1 quadrillion	100-200	Significant overhead
100	1.27 x 10^30	200-400	Requires tooling
200	1.61 x 10^60	400-800	Unsustainable without automation

Even if you only test the "important" combinations, the cognitive load of understanding which combinations matter grows with every flag added. This is where developer velocity silently erodes.

The onboarding multiplier

New engineers must understand existing flags to work effectively. More flags means more conditional logic to learn, more context to absorb, and more time before a new hire can contribute confidently. In our experience, teams with hundreds of stale flags consistently report that new engineer onboarding takes meaningfully longer, and this cost multiplies with every new hire.

The incident response tax

During production incidents, engineers must navigate flag state to diagnose issues. Every flag in a code path is a potential variable that could explain the behavior. Flag-heavy codebases consistently show longer mean time to resolution (MTTR).

In our experience, flag-heavy codebases consistently show longer mean time to resolution. The more flags in a code path, the more variables an engineer must consider during diagnosis -- and the longer it takes to rule out flag-related causes.

An extra 30 minutes on a P1 incident at 2 AM is not just a cost -- it is a morale event. Engineers who repeatedly deal with flag-induced debugging complexity develop learned helplessness that depresses velocity long after the incident is resolved.

The self-assessment checklist

Use this checklist to evaluate your team's flag health. Score each item honestly -- the goal is diagnosis, not perfection.

Quantitative signals (measure these)

Flags per engineer is above 6. Each engineer is responsible for more flags than they can reasonably track.
Stale flag percentage exceeds 40%. Nearly half your flags have completed their purpose and remain.
Cleanup ratio is below 0.5. You are creating flags at least 2x faster than removing them.
Average flag age exceeds 90 days. Most flags are living well beyond a typical release cycle.
No flag has been removed in the past 30 days. Flag removal is not happening at all.

Qualitative signals (observe these)

New hires ask "what does this flag do?" more than once per day. Flags are obscuring rather than clarifying the codebase.
Code reviews include comments like "is this flag still needed?" Reviewers cannot tell if flags are active or stale.
Nobody knows who owns specific flags. Flag ownership has diffused to the point of anonymity.
Engineers are afraid to remove flags. The team has learned helplessness around flag cleanup due to past incidents.
You have flags referencing features that shipped over a year ago. Release flags have become permanent architecture.
Flag names include "temp", "test", "v2", "old", or "new". Naming suggests these were always intended to be temporary.
Your flag management platform shows flags with zero evaluations. Dead flags exist in configuration but not in code (or vice versa).

Scoring

Red Flags Checked	Assessment	Recommended Action
0-2	Healthy	Maintain current practices, consider preventive policies
3-5	Caution	Establish cleanup cadence, set flag expiration policies
6-8	Concerning	Invest in cleanup tooling and process, audit current flags
9-11	Critical	Dedicate a sprint to flag cleanup, implement automated lifecycle management
12+	Emergency	Flag debt is a top-3 engineering productivity issue; treat accordingly

Setting the right thresholds for your team

Instead of adopting universal limits, establish thresholds calibrated to your organization's context.

Step 1: Establish your baseline

Run a one-time audit. Count total flags, categorize them (release, experiment, operational, permission), calculate the three key metrics (density, cleanup ratio, stale percentage), and document flag age distribution.

Tools like FlagShark can automate this audit across repositories by scanning your codebase with tree-sitter AST parsing and producing a complete inventory. If you prefer a manual approach, search your codebase for your flag SDK's evaluation method calls and build a spreadsheet.

Step 2: Categorize flags by intended lifetime

Not all flags should have the same expiration threshold. Establish categories with distinct expectations:

Flag Category	Description	Expected Lifetime	Expiration Policy
Release flags	Gate new features during rollout	1-4 weeks	Remove within 30 days of 100% rollout
Experiment flags	A/B tests and experiments	2-8 weeks	Remove within 14 days of experiment conclusion
Operational flags	Kill switches, circuit breakers	Indefinite	Annual review and re-approval
Permission flags	Entitlement and access control	Indefinite	Quarterly review
Migration flags	Database or service migrations	2-12 weeks	Remove within 30 days of migration completion

The crucial distinction: release and experiment flags should be temporary by default. Operational and permission flags are intentionally long-lived and should not count toward your stale percentage. If you do not make this distinction, your stale percentage will be inflated and your team will ignore the metric entirely.

Step 3: Set limits and enforce them

Once you have categories and baselines, set explicit limits:

Total flag budget = (Number of engineers * target flags-per-engineer)
                  + Operational flags (exempt from budget)

For example, a 30-person team targeting 3 flags per engineer with 15 operational flags:

Flag budget = (30 * 3) + 15 = 105 flags

If your current count exceeds the budget, establish a debt reduction target. A reasonable pace is reducing stale flags by 10-15% per month -- aggressive enough to make progress, sustainable enough to not disrupt feature work.

Step 4: Automate enforcement

Manual flag policies fail. The data is unambiguous on this point: organizations that rely on process alone (cleanup sprints, manual reviews, "flag Fridays") see temporary improvements that regress within 2-3 months.

Sustainable flag management requires automation at key points in the lifecycle:

At creation: Require expiration dates, owners, and categories for new flags. Block PR merges that add flags without metadata.
During lifecycle: Monitor flag age and send alerts when flags approach expiration. Tools like FlagShark track flag lifecycle automatically by analyzing PRs as they are opened and merged.
At expiration: Generate cleanup PRs automatically when flags exceed their expiration date. Automated cleanup PRs remove the single biggest friction point in flag management: the manual work of identifying stale code paths, removing flag evaluations, and cleaning up dead branches.
In CI/CD: Fail builds or raise warnings when flag count exceeds your budget or stale percentage exceeds your threshold.

When more flags are actually fine

Not every high flag count indicates a problem. Some situations legitimately call for more flags:

Multi-tenant SaaS products often use flags for customer-specific configuration and entitlements. A B2B platform with 500 enterprise customers might have 500+ permission flags that are each intentionally long-lived. These should be categorized as operational/permission flags and excluded from stale calculations.

Platform teams and infrastructure use flags as operational controls -- circuit breakers, gradual migrations, load shedding toggles. A platform team with 50 operational flags is not necessarily unhealthy; it depends on whether those flags are documented, owned, and reviewed.

High-velocity product teams practicing trunk-based development may have elevated flag counts at any given moment because they are shipping daily. If their cleanup ratio is near 1.0 and average flag age is under 30 days, a higher absolute count is healthy -- it reflects velocity, not debt.

The key question is always: are these flags intentional and managed, or accidental and forgotten?

The answer, summarized

How many feature flags is too many? The answer in three sentences:

Your flag count is too high when your stale percentage exceeds 40%, your cleanup ratio falls below 0.5, or your flags-per-engineer exceeds 6. These thresholds indicate that flag creation has outpaced your ability to manage the lifecycle, and debt is accumulating in ways that will measurably impact developer productivity.

The absolute number matters less than the trend. A team moving from 200 to 180 flags is healthier than a team moving from 40 to 60, regardless of who has "more" flags.

If you are asking the question, you probably already know the answer. Engineering managers do not Google "how many feature flags is too many" when everything is fine. The fact that you are here means something is causing friction. Use the benchmarks and checklist in this post to quantify the problem, then set concrete thresholds and -- most importantly -- automate the enforcement. The teams that treat flag lifecycle as infrastructure rather than discipline are the ones that keep their codebases clean at scale.

Flag count is a symptom. Cleanup velocity is the diagnosis. Automation is the treatment. Measure the three metrics that matter (density, cleanup ratio, stale percentage), compare against the benchmarks above, and establish policies with automated enforcement. The organizations that get this right build faster, ship safer, and spend their engineering budget on features instead of archaeology.

Feature Flag Governance: A Framework for Engineering Teams at Scale

A practical governance framework for feature flags: ownership policies, lifecycle rules, review processes, and retirement automation. Ready to adopt for teams of any size.

January 27, 2026·16 min read

Feature Flag Graveyard: Why Most Flags Never Get Removed

Why most feature flags become permanent tech debt and how to prevent your codebase from becoming a flag graveyard.

June 15, 2025·3 min read

How to Measure Technical Debt: Metrics That Actually Work

Move beyond gut feeling. Learn concrete metrics for measuring technical debt---including flag age, flag density, cleanup velocity, and unused code percentage---and how to build a dashboard that makes debt visible.

February 7, 2026·11 min read

View all articles

January 22, 2026·10 min read

How Many Feature Flags Is Too Many? A Data-Driven Answer

Industry benchmarks show the average team has 50-200 flags, but most are never removed. Here's how to know when your flag count has crossed from healthy to harmful.

Feature Flags Engineering Management Technical Debt Best Practices

You have 347 feature flags in production. Is that too many?

The short answer: it depends (but here are the numbers)

That said, based on publicly available data and our own experience, we can identify clear warning thresholds. Here are the benchmarks we use:

Metric	Healthy	Caution Zone	Danger Zone
Flags per engineer	1-3	4-6	7+
Flags per 1,000 lines of code (KLOC)	0.5-1.5	1.5-3.0	3.0+
Flags per repository	5-30	30-80	80+
Stale flag percentage (>90 days, non-operational)	<20%	20-40%	40%+
Net flag growth per month	0 (balanced)	+1 to +5	+5 or more
Average flag age	<45 days	45-90 days	90+ days

Industry benchmarks: What the data actually says

LaunchDarkly's published data

Unleash open-source data

Patterns across the industry

Based on our experience working with engineering teams at various stages, public conference talks, and published case studies, the following patterns emerge for flag counts by organization size:

Rough estimates based on our experience:

Organization Size	Total Active Flags	Flags Per Engineer	Stale Percentage
Startup (<50 eng)	15-40	2-4	Lower but growing
Scaleup (50-200 eng)	80-200	4-6	Moderate
Mid-market (200-1K eng)	200-600	5-8	High
Enterprise (1K+ eng)	500-5,000+	4-7	Highest

The three metrics that actually matter

Raw flag count is a vanity metric. It tells you something, but not enough to act on. These three derived metrics are far more diagnostic.

1. Flag density: Flags per KLOC

Flag density measures how thoroughly flags are woven into your codebase. It normalizes for codebase size, making it comparable across projects and teams.

How to calculate it: Count all feature flag evaluations in your codebase (calls to your flag SDK) and divide by thousands of lines of code.

Flag Density = Total flag evaluations / (Total lines of code / 1000)

Benchmarks:

Flag Density (per KLOC)	Assessment	Typical Scenario
0.1-0.5	Low	Small team, few flags, or very large codebase
0.5-1.5	Healthy	Active flag usage with reasonable management
1.5-3.0	Elevated	Heavy flag usage, likely some stale flags
3.0-5.0	High	Flag-heavy codebase, cleanup needed
5.0+	Critical	Flags dominating code paths, urgent cleanup required

2. Cleanup ratio: Flags removed / Flags created

The cleanup ratio is the single best predictor of whether your flag count will become a problem. It measures organizational discipline, not just current state.

Cleanup Ratio = Flags removed this month / Flags created this month

Benchmarks:

Cleanup Ratio	Assessment	Trajectory
>1.0	Excellent	Actively reducing debt
0.8-1.0	Healthy	Roughly balanced, slight growth manageable
0.5-0.8	Concerning	Accumulating debt, intervention needed within 6 months
0.3-0.5	Poor	Significant accumulation, process/tooling gaps
<0.3	Critical	Creating 3x+ more than removing, flag graveyard forming

3. Stale flag percentage

How to calculate it: Count flags older than your expiration threshold (commonly 90 days for release flags) that are not documented as long-lived operational flags. Divide by total flag count.

Stale Percentage = Stale flags / Total flags * 100

Benchmarks:

Stale Percentage	Assessment	Impact
<15%	Excellent	Minimal debt, strong lifecycle management
15-30%	Good	Some debt, manageable with periodic cleanup
30-50%	Elevated	Noticeable developer friction, code reviews slowed
50-70%	High	Significant productivity drain, onboarding impacted
70%+	Critical	Flag graveyard; major investment needed to recover

In our experience, the typical stale percentage for most organizations is well above 50%. If your number is below 30%, you are doing better than most teams we have worked with.

Why raw count is misleading

Consider two teams:

Team A has 4x the flags but dramatically better flag health. Team B has a small number of flags but is sitting on a minefield. This is why "how many is too many" requires context beyond the count.

The compounding cost of excess flags

Flag debt compounds in ways that are not immediately obvious. Each additional stale flag does not add cost linearly -- it adds cost geometrically because of interactions between flags.

The testing combinatorial problem

Active Flags	Theoretical Combinations	Realistic Test Paths	Review Complexity
10	1,024	20-30	Manageable
25	33+ million	50-100	Requires discipline
50	1.1 quadrillion	100-200	Significant overhead
100	1.27 x 10^30	200-400	Requires tooling
200	1.61 x 10^60	400-800	Unsustainable without automation

Even if you only test the "important" combinations, the cognitive load of understanding which combinations matter grows with every flag added. This is where developer velocity silently erodes.

The onboarding multiplier

The incident response tax

The self-assessment checklist

Use this checklist to evaluate your team's flag health. Score each item honestly -- the goal is diagnosis, not perfection.

Quantitative signals (measure these)

Flags per engineer is above 6. Each engineer is responsible for more flags than they can reasonably track.
Stale flag percentage exceeds 40%. Nearly half your flags have completed their purpose and remain.
Cleanup ratio is below 0.5. You are creating flags at least 2x faster than removing them.
Average flag age exceeds 90 days. Most flags are living well beyond a typical release cycle.
No flag has been removed in the past 30 days. Flag removal is not happening at all.

Qualitative signals (observe these)

New hires ask "what does this flag do?" more than once per day. Flags are obscuring rather than clarifying the codebase.
Code reviews include comments like "is this flag still needed?" Reviewers cannot tell if flags are active or stale.
Nobody knows who owns specific flags. Flag ownership has diffused to the point of anonymity.
Engineers are afraid to remove flags. The team has learned helplessness around flag cleanup due to past incidents.
You have flags referencing features that shipped over a year ago. Release flags have become permanent architecture.
Flag names include "temp", "test", "v2", "old", or "new". Naming suggests these were always intended to be temporary.
Your flag management platform shows flags with zero evaluations. Dead flags exist in configuration but not in code (or vice versa).

Scoring

Red Flags Checked	Assessment	Recommended Action
0-2	Healthy	Maintain current practices, consider preventive policies
3-5	Caution	Establish cleanup cadence, set flag expiration policies
6-8	Concerning	Invest in cleanup tooling and process, audit current flags
9-11	Critical	Dedicate a sprint to flag cleanup, implement automated lifecycle management
12+	Emergency	Flag debt is a top-3 engineering productivity issue; treat accordingly

Setting the right thresholds for your team

Instead of adopting universal limits, establish thresholds calibrated to your organization's context.

Step 1: Establish your baseline

Step 2: Categorize flags by intended lifetime

Not all flags should have the same expiration threshold. Establish categories with distinct expectations:

Flag Category	Description	Expected Lifetime	Expiration Policy
Release flags	Gate new features during rollout	1-4 weeks	Remove within 30 days of 100% rollout
Experiment flags	A/B tests and experiments	2-8 weeks	Remove within 14 days of experiment conclusion
Operational flags	Kill switches, circuit breakers	Indefinite	Annual review and re-approval
Permission flags	Entitlement and access control	Indefinite	Quarterly review
Migration flags	Database or service migrations	2-12 weeks	Remove within 30 days of migration completion

Step 3: Set limits and enforce them

Once you have categories and baselines, set explicit limits:

Total flag budget = (Number of engineers * target flags-per-engineer)
                  + Operational flags (exempt from budget)

For example, a 30-person team targeting 3 flags per engineer with 15 operational flags:

Flag budget = (30 * 3) + 15 = 105 flags

Step 4: Automate enforcement

Sustainable flag management requires automation at key points in the lifecycle:

At creation: Require expiration dates, owners, and categories for new flags. Block PR merges that add flags without metadata.
During lifecycle: Monitor flag age and send alerts when flags approach expiration. Tools like FlagShark track flag lifecycle automatically by analyzing PRs as they are opened and merged.
At expiration: Generate cleanup PRs automatically when flags exceed their expiration date. Automated cleanup PRs remove the single biggest friction point in flag management: the manual work of identifying stale code paths, removing flag evaluations, and cleaning up dead branches.
In CI/CD: Fail builds or raise warnings when flag count exceeds your budget or stale percentage exceeds your threshold.

When more flags are actually fine

Not every high flag count indicates a problem. Some situations legitimately call for more flags:

The key question is always: are these flags intentional and managed, or accidental and forgotten?

The answer, summarized

How many feature flags is too many? The answer in three sentences:

The absolute number matters less than the trend. A team moving from 200 to 180 flags is healthier than a team moving from 40 to 60, regardless of who has "more" flags.

Feature Flag Governance: A Framework for Engineering Teams at Scale

A practical governance framework for feature flags: ownership policies, lifecycle rules, review processes, and retirement automation. Ready to adopt for teams of any size.

January 27, 2026·16 min read

Feature Flag Graveyard: Why Most Flags Never Get Removed

Why most feature flags become permanent tech debt and how to prevent your codebase from becoming a flag graveyard.

June 15, 2025·3 min read

How to Measure Technical Debt: Metrics That Actually Work

February 7, 2026·11 min read

View all articles

The short answer: it depends (but here are the numbers)

Industry benchmarks: What the data actually says

LaunchDarkly's published data

Unleash open-source data

Patterns across the industry

The three metrics that actually matter

1. Flag density: Flags per KLOC

2. Cleanup ratio: Flags removed / Flags created

3. Stale flag percentage

Why raw count is misleading

The compounding cost of excess flags

The testing combinatorial problem

The onboarding multiplier

The incident response tax

The self-assessment checklist

Quantitative signals (measure these)

Qualitative signals (observe these)

Scoring

Setting the right thresholds for your team

Step 1: Establish your baseline

Step 2: Categorize flags by intended lifetime

Step 3: Set limits and enforce them

Step 4: Automate enforcement

When more flags are actually fine

The answer, summarized

More articles

Feature Flag Governance: A Framework for Engineering Teams at Scale

Feature Flag Graveyard: Why Most Flags Never Get Removed

How to Measure Technical Debt: Metrics That Actually Work

The short answer: it depends (but here are the numbers)

Industry benchmarks: What the data actually says

LaunchDarkly's published data

Unleash open-source data

Patterns across the industry

The three metrics that actually matter

1. Flag density: Flags per KLOC

2. Cleanup ratio: Flags removed / Flags created

3. Stale flag percentage

Why raw count is misleading

The compounding cost of excess flags

The testing combinatorial problem

The onboarding multiplier

The incident response tax

The self-assessment checklist

Quantitative signals (measure these)

Qualitative signals (observe these)

Scoring

Setting the right thresholds for your team

Step 1: Establish your baseline

Step 2: Categorize flags by intended lifetime

Step 3: Set limits and enforce them

Step 4: Automate enforcement

When more flags are actually fine

The answer, summarized

More articles

Feature Flag Governance: A Framework for Engineering Teams at Scale

Feature Flag Graveyard: Why Most Flags Never Get Removed

How to Measure Technical Debt: Metrics That Actually Work