Your CI/CD pipeline used to take eight minutes. Now it takes twenty-three. Nobody can pinpoint when the regression happened because it did not happen---it accumulated. Week after week, commit after commit, the pipeline got a little slower and nobody noticed until it crossed the threshold from "grab a coffee" to "context-switch to another task and lose thirty minutes."
You have already done the obvious optimizations. Parallelized test suites. Cached dependencies. Upgraded to faster runners. Trimmed Docker layers. And yet, your pipeline is still slower than it should be. Here is a diagnosis most teams never consider: your stale feature flags are a significant contributor to your pipeline slowdown, and the problem gets worse with every flag you forget to clean up.
The slow pipeline symptoms nobody connects to flags
When teams investigate slow CI/CD pipelines, they focus on the usual suspects: test execution time, dependency installation, Docker image builds, deployment steps. Feature flags rarely enter the conversation because the connection is indirect. Flags do not add a single slow step to your pipeline---they make every existing step marginally slower. Death by a thousand cuts.
Here is how the degradation works across each stage of a typical pipeline.
Build time inflation
Every feature flag introduces conditional logic. That conditional logic creates additional code paths that the compiler, bundler, or interpreter must process. A single flag adds two branches. Ten stale flags in a module add up to 1,024 theoretical code paths. The build tools do not know which paths are dead---they process all of them.
The impact on build times scales with the number of stale flags. While exact numbers vary depending on codebase size and language, the pattern is consistent: more stale flags means more dead code for build tools to process, larger bundles, and slower compilation.
The impact compounds with codebase size. For a frontend application with a 2 MB production bundle, 100 stale flags can add 300-500 KB of dead code that webpack or esbuild faithfully processes, tree-shakes (partially), and bundles on every single build. For Go services, the compiler must type-check and compile both branches of every flag conditional, even if one branch has not executed in production for six months.
A concrete example. Consider a React application with a stale flag controlling a dashboard component:
function AnalyticsDashboard({ user }: Props) {
const showNewDashboard = useFeatureFlag('analytics_dashboard_v2');
if (showNewDashboard) {
return <NewAnalyticsDashboard user={user} />;
}
// This branch has been dead for 4 months
return <LegacyAnalyticsDashboard user={user} />;
}
The LegacyAnalyticsDashboard component imports its own dependencies: a charting library, utility functions, CSS modules, maybe even a deprecated API client. Those imports pull in their own dependencies. The bundler follows the entire import chain because it cannot know at build time that this branch will never execute. If tree-shaking is imperfect---and it always is, especially with side effects---kilobytes of dead code end up in the production bundle.
Multiply this by 50 stale flags across a codebase, and you have a meaningful chunk of your build time spent processing code that serves no purpose.
Test suite bloat
This is where stale flags inflict the most damage on pipeline speed. Feature flags multiply the number of code paths your test suite needs to cover, and most teams test both sides of every flag---including flags that have been permanently enabled for months.
The math is unforgiving. If your test suite exercises both states of every feature flag, each flag doubles the test matrix for the code it touches. Three independent flags in the same module create 8 test combinations. Five flags create 32. In practice, teams do not test every permutation, but they typically test at least both states of each flag independently, which still means every stale flag adds redundant test runs.
The impact adds up quickly. If your test suite exercises both states of every feature flag, each stale flag adds at least two redundant tests. Fifty stale flags can easily mean 100+ extra tests -- and if some of those involve integration tests running against real services or databases, the time impact grows substantially.
And these are not just any tests. Flag-related tests often involve mocking the flag evaluation service, setting up different application states for each branch, and running assertions against both code paths. Integration tests are particularly expensive: a single stale flag in an API endpoint might trigger duplicate end-to-end test runs that each take 10-30 seconds.
# These tests run on every CI build.
# The flag has been 100% enabled since October.
# Both tests pass. Both are completely pointless.
class TestCheckoutFlow:
def test_checkout_with_new_payment_v2_enabled(self, mock_flags):
mock_flags.set("new_payment_flow_v2", True)
response = self.client.post("/checkout", data=self.valid_order)
assert response.status_code == 200
assert "stripe_payment_intent" in response.json()
def test_checkout_with_new_payment_v2_disabled(self, mock_flags):
mock_flags.set("new_payment_flow_v2", False)
response = self.client.post("/checkout", data=self.valid_order)
assert response.status_code == 200
assert "legacy_payment_token" in response.json()
The second test is dead weight. It tests a code path that will never execute in production, but it runs on every build, every PR, every merge to main. Across 50 stale flags, this dead testing easily adds 5-10 minutes to your pipeline.
Docker image bloat
Stale flags directly inflate Docker image sizes. Dead code branches, their dependencies, and their assets all get packaged into your container images. Larger images mean slower pushes to your registry, slower pulls by your orchestrator, and slower cold starts for serverless functions.
# Your production Dockerfile faithfully includes
# everything the build produces, including dead code
COPY --from=builder /app/dist /app/dist
# The /app/dist directory contains:
# - 1.8 MB of active code
# - 400 KB of code behind stale flags that never executes
# - 200 KB of dependencies only used by dead flag branches
For teams deploying to AWS Lambda, where cold start time is directly correlated with package size, the impact is even more acute. Every megabyte of dead code adds 5-15 ms to cold start times. For latency-sensitive services, this is not theoretical---it shows up in P99 latency metrics.
Code review velocity
Stale flags do not just slow automated pipeline stages---they slow humans too. Pull requests that touch code near stale flags force reviewers to mentally parse dead branches, understand flag states, and verify that changes do not break code paths that are not even active.
The effect on code review is predictable: when reviewers encounter code near stale flags, reviews take longer, require more back-and-forth, and produce less confident approvals. Reviewers must reason about flag states they do not fully understand, which slows everything down.
When reviewers encounter code like this, they have to answer questions that the code alone cannot answer:
func ProcessOrder(ctx context.Context, order *Order) error {
if flags.IsEnabled("order_validation_v3") {
if err := validateOrderV3(ctx, order); err != nil {
return fmt.Errorf("validation failed: %w", err)
}
} else {
if err := validateOrderLegacy(order); err != nil {
return err
}
}
// Is this flag still relevant?
// Which branch is active in production?
// Do I need to update both branches?
// What happens if someone toggles this flag?
if flags.IsEnabled("new_pricing_engine") {
order.Total = calculatePricingV2(ctx, order)
} else {
order.Total = calculateLegacyPricing(order)
}
return persistOrder(ctx, order)
}
A reviewer seeing this function must now verify the behavior of four possible flag combinations, even if only one combination is active in production. The review takes longer, comments are less precise, and subtle bugs in the dead paths go unnoticed because reviewers focus their attention on the active paths.
Quantifying the full pipeline impact
Taken individually, each of these effects seems minor. A few percent here, a minute there. But pipeline time is not experienced in isolation---it compounds across every developer, every PR, and every day.
The aggregate impact is what matters. When you add up the build time overhead, the redundant test execution, the Docker image bloat, and the slower code reviews across every developer on the team, every PR, and every day -- the cost becomes significant. For a mid-size engineering team, the cumulative hours lost to stale-flag-related pipeline drag can easily add up to multiple full-time engineers worth of wasted capacity per year.
This does not account for the context-switching cost when developers wait for slow builds. Research on workplace interruptions shows that recovering focus after a context switch takes significant time, and slow CI pipelines force these context switches multiple times per day.
The compounding effect: why it gets worse over time
Stale flags do not just cause a fixed amount of pipeline degradation. The problem compounds because flags interact with each other and with the codebase in non-linear ways.
Flag accumulation rate outpaces cleanup
Most teams create flags faster than they remove them. In our experience, the typical engineering team creates flags at a rate that significantly outpaces cleanup, leading to net accumulation month over month. That means pipeline degradation accelerates quarter over quarter.
Without active cleanup, the stale flag count grows steadily. Each quarter adds more dead code to your builds, more redundant tests to your suite, and more bloat to your images. By the end of a year without cleanup, your pipeline can be materially slower than it was at the start, and the degradation will not plateau because the accumulation does not plateau.
Flags create maintenance gravity
Stale flags make code harder to modify, which makes developers less likely to clean up the surrounding code, which makes more code stale. A developer who encounters a function with three stale flags and wants to add a small feature will route around the complexity rather than clean it up. The function grows. More flags accumulate. The build processes more dead code. The pipeline slows further.
Flag dependencies create hidden coupling
Flags that depend on other flags---where one flag's behavior changes based on another flag's state---create exponential complexity. When either flag becomes stale, the interaction is nearly impossible to reason about. Teams leave both flags in place "just in case," doubling the dead code and the test burden.
The fix: systematic flag cleanup and its measurable impact
The good news is that flag cleanup produces immediate, measurable improvements to pipeline speed. Unlike many performance optimizations that require architectural changes, removing stale flags is purely subtractive work---you are deleting code, and the pipeline gets faster.
The impact of flag cleanup on pipeline speed
Teams that conduct systematic flag cleanup consistently report meaningful improvements across the board: faster build times, shorter test suites, smaller Docker images, quicker code reviews, and reduced serverless cold start times. The exact numbers vary depending on the codebase and how many flags are removed, but the improvements are immediate because every build, test run, and Docker build benefits from processing less code.
In the case study covered in our post on removing 500 stale flags, the team saw a 20% reduction in CI build time, a 37% reduction in test suite duration, and a 29% increase in deployment frequency.
Practical cleanup approaches
Manual audit and removal. The most straightforward approach: inventory all flags, identify which are stale (100% enabled or disabled for 30+ days), and create PRs to remove each one. This works for small flag counts (under 20) but does not scale. Manual removal of a single flag typically takes 30-60 minutes including code changes, test updates, and review.
Scheduled cleanup sprints. Dedicate a fixed time each quarter (a "flag cleanup day" or a sprint rotation) to removing stale flags. This creates cultural momentum but relies on discipline and prioritization against feature work.
Automated detection and removal. The most scalable approach: use AST-based analysis to detect stale flags in your codebase and automatically generate cleanup PRs. Tools that use tree-sitter parsing can understand the syntax structure across multiple languages and safely remove flag conditionals, dead branches, and orphaned imports.
This is the approach FlagShark takes---it monitors your GitHub repositories for flag changes in every PR, tracks flag lifecycle from creation through full rollout, and generates cleanup PRs when flags become stale. Because it uses tree-sitter AST parsing rather than regex, it handles complex flag patterns across 11 programming languages without false positives or unsafe removals.
The cleanup ROI
For teams considering whether flag cleanup is worth prioritizing against feature work, the math is straightforward: the ongoing cost of stale flags (developer wait time, redundant test execution, cognitive overhead during reviews, larger images and slower deploys) accumulates every single day. The one-time cost of removing a flag is typically 30-60 minutes of engineering time per flag.
Even manual cleanup pays for itself quickly. Automated cleanup -- where the tool generates the PR and you just review and merge -- pays for itself almost immediately.
Building flag hygiene into your pipeline
The most effective long-term strategy is not periodic cleanup campaigns but continuous flag hygiene integrated into your existing CI/CD workflow.
Step 1: Make flag accumulation visible
Add a flag inventory step to your CI pipeline that reports the current count and age distribution of flags on every build. When the number is visible, teams naturally prioritize cleanup.
Step 2: Set expiration expectations
Establish a team norm that release flags have a 30-day maximum lifespan and experiment flags have a 14-day maximum. Enforce these through CI warnings that escalate to blocking checks as flags age.
Step 3: Automate the removal
Manual flag removal does not scale. The cognitive effort of identifying stale flags, understanding their code impact, writing safe removal PRs, and updating tests is substantial enough that it perpetually loses priority to feature work. Automated tools that generate cleanup PRs reduce the effort from "create and review a PR" to "review and merge a PR"---a 60-80% reduction in effort that makes cleanup sustainable.
Step 4: Track pipeline metrics alongside flag counts
Correlate your pipeline speed metrics with your flag count over time. When your team can see the direct relationship between "we removed 30 flags last month" and "our build time dropped 4 minutes," the motivation for ongoing cleanup becomes self-sustaining.
Your CI/CD pipeline is not slow because of one big problem. It is slow because of hundreds of small ones, and stale feature flags are responsible for more of those small problems than most teams realize. Every flag left in your codebase after it has served its purpose adds dead code to your builds, redundant runs to your tests, dead weight to your containers, and cognitive overhead to your reviews.
The path from a 25-minute pipeline back to a 12-minute pipeline does not require a heroic rewrite or an expensive infrastructure upgrade. It requires removing the dead code your team has been unconsciously accumulating, one stale flag at a time. Start by counting your stale flags. The number will be higher than you expect, and the pipeline improvement from removing them will be larger than you predict.