January 30, 2026·9 min read

Why Your CI/CD Pipeline Is Slow (And How Feature Flags Make It Worse)

Stale feature flags silently bloat builds, slow tests, and add dead code paths. Here's how they're dragging down your CI/CD pipeline and what to do about it.

Feature Flags DevOps Technical Debt Code Quality

Your CI/CD pipeline used to take eight minutes. Now it takes twenty-three. Nobody can pinpoint when the regression happened because it did not happen---it accumulated. Week after week, commit after commit, the pipeline got a little slower and nobody noticed until it crossed the threshold from "grab a coffee" to "context-switch to another task and lose thirty minutes."

You have already done the obvious optimizations. Parallelized test suites. Cached dependencies. Upgraded to faster runners. Trimmed Docker layers. And yet, your pipeline is still slower than it should be. Here is a diagnosis most teams never consider: your stale feature flags are a significant contributor to your pipeline slowdown, and the problem gets worse with every flag you forget to clean up.

The slow pipeline symptoms nobody connects to flags

When teams investigate slow CI/CD pipelines, they focus on the usual suspects: test execution time, dependency installation, Docker image builds, deployment steps. Feature flags rarely enter the conversation because the connection is indirect. Flags do not add a single slow step to your pipeline---they make every existing step marginally slower. Death by a thousand cuts.

Here is how the degradation works across each stage of a typical pipeline.

Build time inflation

Every feature flag introduces conditional logic. That conditional logic creates additional code paths that the compiler, bundler, or interpreter must process. A single flag adds two branches. Ten stale flags in a module add up to 1,024 theoretical code paths. The build tools do not know which paths are dead---they process all of them.

The impact on build times scales with the number of stale flags. While exact numbers vary depending on codebase size and language, the pattern is consistent: more stale flags means more dead code for build tools to process, larger bundles, and slower compilation.

The impact compounds with codebase size. For a frontend application with a 2 MB production bundle, 100 stale flags can add 300-500 KB of dead code that webpack or esbuild faithfully processes, tree-shakes (partially), and bundles on every single build. For Go services, the compiler must type-check and compile both branches of every flag conditional, even if one branch has not executed in production for six months.

A concrete example. Consider a React application with a stale flag controlling a dashboard component:

function AnalyticsDashboard({ user }: Props) {
  const showNewDashboard = useFeatureFlag('analytics_dashboard_v2');

  if (showNewDashboard) {
    return <NewAnalyticsDashboard user={user} />;
  }

  // This branch has been dead for 4 months
  return <LegacyAnalyticsDashboard user={user} />;
}

The LegacyAnalyticsDashboard component imports its own dependencies: a charting library, utility functions, CSS modules, maybe even a deprecated API client. Those imports pull in their own dependencies. The bundler follows the entire import chain because it cannot know at build time that this branch will never execute. If tree-shaking is imperfect---and it always is, especially with side effects---kilobytes of dead code end up in the production bundle.

Multiply this by 50 stale flags across a codebase, and you have a meaningful chunk of your build time spent processing code that serves no purpose.

Test suite bloat

This is where stale flags inflict the most damage on pipeline speed. Feature flags multiply the number of code paths your test suite needs to cover, and most teams test both sides of every flag---including flags that have been permanently enabled for months.

The math is unforgiving. If your test suite exercises both states of every feature flag, each flag doubles the test matrix for the code it touches. Three independent flags in the same module create 8 test combinations. Five flags create 32. In practice, teams do not test every permutation, but they typically test at least both states of each flag independently, which still means every stale flag adds redundant test runs.

The impact adds up quickly. If your test suite exercises both states of every feature flag, each stale flag adds at least two redundant tests. Fifty stale flags can easily mean 100+ extra tests -- and if some of those involve integration tests running against real services or databases, the time impact grows substantially.

And these are not just any tests. Flag-related tests often involve mocking the flag evaluation service, setting up different application states for each branch, and running assertions against both code paths. Integration tests are particularly expensive: a single stale flag in an API endpoint might trigger duplicate end-to-end test runs that each take 10-30 seconds.

# These tests run on every CI build.
# The flag has been 100% enabled since October.
# Both tests pass. Both are completely pointless.

class TestCheckoutFlow:
    def test_checkout_with_new_payment_v2_enabled(self, mock_flags):
        mock_flags.set("new_payment_flow_v2", True)
        response = self.client.post("/checkout", data=self.valid_order)
        assert response.status_code == 200
        assert "stripe_payment_intent" in response.json()

    def test_checkout_with_new_payment_v2_disabled(self, mock_flags):
        mock_flags.set("new_payment_flow_v2", False)
        response = self.client.post("/checkout", data=self.valid_order)
        assert response.status_code == 200
        assert "legacy_payment_token" in response.json()

The second test is dead weight. It tests a code path that will never execute in production, but it runs on every build, every PR, every merge to main. Across 50 stale flags, this dead testing easily adds 5-10 minutes to your pipeline.

Docker image bloat

Stale flags directly inflate Docker image sizes. Dead code branches, their dependencies, and their assets all get packaged into your container images. Larger images mean slower pushes to your registry, slower pulls by your orchestrator, and slower cold starts for serverless functions.

# Your production Dockerfile faithfully includes
# everything the build produces, including dead code
COPY --from=builder /app/dist /app/dist

# The /app/dist directory contains:
# - 1.8 MB of active code
# - 400 KB of code behind stale flags that never executes
# - 200 KB of dependencies only used by dead flag branches

For teams deploying to AWS Lambda, where cold start time is directly correlated with package size, the impact is even more acute. Every megabyte of dead code adds 5-15 ms to cold start times. For latency-sensitive services, this is not theoretical---it shows up in P99 latency metrics.

Code review velocity

Stale flags do not just slow automated pipeline stages---they slow humans too. Pull requests that touch code near stale flags force reviewers to mentally parse dead branches, understand flag states, and verify that changes do not break code paths that are not even active.

The effect on code review is predictable: when reviewers encounter code near stale flags, reviews take longer, require more back-and-forth, and produce less confident approvals. Reviewers must reason about flag states they do not fully understand, which slows everything down.

When reviewers encounter code like this, they have to answer questions that the code alone cannot answer:

func ProcessOrder(ctx context.Context, order *Order) error {
    if flags.IsEnabled("order_validation_v3") {
        if err := validateOrderV3(ctx, order); err != nil {
            return fmt.Errorf("validation failed: %w", err)
        }
    } else {
        if err := validateOrderLegacy(order); err != nil {
            return err
        }
    }

    // Is this flag still relevant?
    // Which branch is active in production?
    // Do I need to update both branches?
    // What happens if someone toggles this flag?
    if flags.IsEnabled("new_pricing_engine") {
        order.Total = calculatePricingV2(ctx, order)
    } else {
        order.Total = calculateLegacyPricing(order)
    }

    return persistOrder(ctx, order)
}

A reviewer seeing this function must now verify the behavior of four possible flag combinations, even if only one combination is active in production. The review takes longer, comments are less precise, and subtle bugs in the dead paths go unnoticed because reviewers focus their attention on the active paths.

Quantifying the full pipeline impact

Taken individually, each of these effects seems minor. A few percent here, a minute there. But pipeline time is not experienced in isolation---it compounds across every developer, every PR, and every day.

The aggregate impact is what matters. When you add up the build time overhead, the redundant test execution, the Docker image bloat, and the slower code reviews across every developer on the team, every PR, and every day -- the cost becomes significant. For a mid-size engineering team, the cumulative hours lost to stale-flag-related pipeline drag can easily add up to multiple full-time engineers worth of wasted capacity per year.

This does not account for the context-switching cost when developers wait for slow builds. Research on workplace interruptions shows that recovering focus after a context switch takes significant time, and slow CI pipelines force these context switches multiple times per day.

The compounding effect: why it gets worse over time

Stale flags do not just cause a fixed amount of pipeline degradation. The problem compounds because flags interact with each other and with the codebase in non-linear ways.

Flag accumulation rate outpaces cleanup

Most teams create flags faster than they remove them. In our experience, the typical engineering team creates flags at a rate that significantly outpaces cleanup, leading to net accumulation month over month. That means pipeline degradation accelerates quarter over quarter.

Without active cleanup, the stale flag count grows steadily. Each quarter adds more dead code to your builds, more redundant tests to your suite, and more bloat to your images. By the end of a year without cleanup, your pipeline can be materially slower than it was at the start, and the degradation will not plateau because the accumulation does not plateau.

Flags create maintenance gravity

Stale flags make code harder to modify, which makes developers less likely to clean up the surrounding code, which makes more code stale. A developer who encounters a function with three stale flags and wants to add a small feature will route around the complexity rather than clean it up. The function grows. More flags accumulate. The build processes more dead code. The pipeline slows further.

Flag dependencies create hidden coupling

Flags that depend on other flags---where one flag's behavior changes based on another flag's state---create exponential complexity. When either flag becomes stale, the interaction is nearly impossible to reason about. Teams leave both flags in place "just in case," doubling the dead code and the test burden.

The fix: systematic flag cleanup and its measurable impact

The good news is that flag cleanup produces immediate, measurable improvements to pipeline speed. Unlike many performance optimizations that require architectural changes, removing stale flags is purely subtractive work---you are deleting code, and the pipeline gets faster.

The impact of flag cleanup on pipeline speed

Teams that conduct systematic flag cleanup consistently report meaningful improvements across the board: faster build times, shorter test suites, smaller Docker images, quicker code reviews, and reduced serverless cold start times. The exact numbers vary depending on the codebase and how many flags are removed, but the improvements are immediate because every build, test run, and Docker build benefits from processing less code.

In the case study covered in our post on removing 500 stale flags, the team saw a 20% reduction in CI build time, a 37% reduction in test suite duration, and a 29% increase in deployment frequency.

Practical cleanup approaches

Manual audit and removal. The most straightforward approach: inventory all flags, identify which are stale (100% enabled or disabled for 30+ days), and create PRs to remove each one. This works for small flag counts (under 20) but does not scale. Manual removal of a single flag typically takes 30-60 minutes including code changes, test updates, and review.

Scheduled cleanup sprints. Dedicate a fixed time each quarter (a "flag cleanup day" or a sprint rotation) to removing stale flags. This creates cultural momentum but relies on discipline and prioritization against feature work.

Automated detection and removal. The most scalable approach: use AST-based analysis to detect stale flags in your codebase and automatically generate cleanup PRs. Tools that use tree-sitter parsing can understand the syntax structure across multiple languages and safely remove flag conditionals, dead branches, and orphaned imports.

This is the approach FlagShark takes---it monitors your GitHub repositories for flag changes in every PR, tracks flag lifecycle from creation through full rollout, and generates cleanup PRs when flags become stale. Because it uses tree-sitter AST parsing rather than regex, it handles complex flag patterns across 11 programming languages without false positives or unsafe removals.

The cleanup ROI

For teams considering whether flag cleanup is worth prioritizing against feature work, the math is straightforward: the ongoing cost of stale flags (developer wait time, redundant test execution, cognitive overhead during reviews, larger images and slower deploys) accumulates every single day. The one-time cost of removing a flag is typically 30-60 minutes of engineering time per flag.

Even manual cleanup pays for itself quickly. Automated cleanup -- where the tool generates the PR and you just review and merge -- pays for itself almost immediately.

Building flag hygiene into your pipeline

The most effective long-term strategy is not periodic cleanup campaigns but continuous flag hygiene integrated into your existing CI/CD workflow.

Step 1: Make flag accumulation visible

Add a flag inventory step to your CI pipeline that reports the current count and age distribution of flags on every build. When the number is visible, teams naturally prioritize cleanup.

Step 2: Set expiration expectations

Establish a team norm that release flags have a 30-day maximum lifespan and experiment flags have a 14-day maximum. Enforce these through CI warnings that escalate to blocking checks as flags age.

Step 3: Automate the removal

Manual flag removal does not scale. The cognitive effort of identifying stale flags, understanding their code impact, writing safe removal PRs, and updating tests is substantial enough that it perpetually loses priority to feature work. Automated tools that generate cleanup PRs reduce the effort from "create and review a PR" to "review and merge a PR"---a 60-80% reduction in effort that makes cleanup sustainable.

Step 4: Track pipeline metrics alongside flag counts

Correlate your pipeline speed metrics with your flag count over time. When your team can see the direct relationship between "we removed 30 flags last month" and "our build time dropped 4 minutes," the motivation for ongoing cleanup becomes self-sustaining.

Your CI/CD pipeline is not slow because of one big problem. It is slow because of hundreds of small ones, and stale feature flags are responsible for more of those small problems than most teams realize. Every flag left in your codebase after it has served its purpose adds dead code to your builds, redundant runs to your tests, dead weight to your containers, and cognitive overhead to your reviews.

The path from a 25-minute pipeline back to a 12-minute pipeline does not require a heroic rewrite or an expensive infrastructure upgrade. It requires removing the dead code your team has been unconsciously accumulating, one stale flag at a time. Start by counting your stale flags. The number will be higher than you expect, and the pipeline improvement from removing them will be larger than you predict.

New Developer Onboarding: How Stale Feature Flags Slow Down Your Team

Stale feature flags meaningfully slow new hire onboarding. Here's how flag debt creates confusion for new developers and what to do about it.

February 3, 2026·12 min read

Feature Flag Testing Strategy: How to Test Without Losing Your Mind

With n flags you have 2^n possible states. Here's how to build a practical testing strategy that covers what matters without drowning in combinatorial complexity.

January 29, 2026·13 min read

Feature Flags and Trunk-Based Development: The Cleanup Problem Nobody Talks About

Trunk-based development encourages more feature flags — but who cleans them up? The missing piece in every TBD guide and how to fix it.

January 23, 2026·13 min read

View all articles

January 30, 2026·9 min read

Why Your CI/CD Pipeline Is Slow (And How Feature Flags Make It Worse)

Stale feature flags silently bloat builds, slow tests, and add dead code paths. Here's how they're dragging down your CI/CD pipeline and what to do about it.

Feature Flags DevOps Technical Debt Code Quality

The slow pipeline symptoms nobody connects to flags

Here is how the degradation works across each stage of a typical pipeline.

Build time inflation

A concrete example. Consider a React application with a stale flag controlling a dashboard component:

function AnalyticsDashboard({ user }: Props) {
  const showNewDashboard = useFeatureFlag('analytics_dashboard_v2');

  if (showNewDashboard) {
    return <NewAnalyticsDashboard user={user} />;
  }

  // This branch has been dead for 4 months
  return <LegacyAnalyticsDashboard user={user} />;
}

Multiply this by 50 stale flags across a codebase, and you have a meaningful chunk of your build time spent processing code that serves no purpose.

Test suite bloat

# These tests run on every CI build.
# The flag has been 100% enabled since October.
# Both tests pass. Both are completely pointless.

class TestCheckoutFlow:
    def test_checkout_with_new_payment_v2_enabled(self, mock_flags):
        mock_flags.set("new_payment_flow_v2", True)
        response = self.client.post("/checkout", data=self.valid_order)
        assert response.status_code == 200
        assert "stripe_payment_intent" in response.json()

    def test_checkout_with_new_payment_v2_disabled(self, mock_flags):
        mock_flags.set("new_payment_flow_v2", False)
        response = self.client.post("/checkout", data=self.valid_order)
        assert response.status_code == 200
        assert "legacy_payment_token" in response.json()

Docker image bloat

# Your production Dockerfile faithfully includes
# everything the build produces, including dead code
COPY --from=builder /app/dist /app/dist

# The /app/dist directory contains:
# - 1.8 MB of active code
# - 400 KB of code behind stale flags that never executes
# - 200 KB of dependencies only used by dead flag branches

Code review velocity

When reviewers encounter code like this, they have to answer questions that the code alone cannot answer:

func ProcessOrder(ctx context.Context, order *Order) error {
    if flags.IsEnabled("order_validation_v3") {
        if err := validateOrderV3(ctx, order); err != nil {
            return fmt.Errorf("validation failed: %w", err)
        }
    } else {
        if err := validateOrderLegacy(order); err != nil {
            return err
        }
    }

    // Is this flag still relevant?
    // Which branch is active in production?
    // Do I need to update both branches?
    // What happens if someone toggles this flag?
    if flags.IsEnabled("new_pricing_engine") {
        order.Total = calculatePricingV2(ctx, order)
    } else {
        order.Total = calculateLegacyPricing(order)
    }

    return persistOrder(ctx, order)
}

Quantifying the full pipeline impact

The compounding effect: why it gets worse over time

Stale flags do not just cause a fixed amount of pipeline degradation. The problem compounds because flags interact with each other and with the codebase in non-linear ways.

Flag accumulation rate outpaces cleanup

Flags create maintenance gravity

Flag dependencies create hidden coupling

The fix: systematic flag cleanup and its measurable impact

The impact of flag cleanup on pipeline speed

In the case study covered in our post on removing 500 stale flags, the team saw a 20% reduction in CI build time, a 37% reduction in test suite duration, and a 29% increase in deployment frequency.

Practical cleanup approaches

The cleanup ROI

Even manual cleanup pays for itself quickly. Automated cleanup -- where the tool generates the PR and you just review and merge -- pays for itself almost immediately.

Building flag hygiene into your pipeline

The most effective long-term strategy is not periodic cleanup campaigns but continuous flag hygiene integrated into your existing CI/CD workflow.

Step 1: Make flag accumulation visible

Add a flag inventory step to your CI pipeline that reports the current count and age distribution of flags on every build. When the number is visible, teams naturally prioritize cleanup.

Step 2: Set expiration expectations

Establish a team norm that release flags have a 30-day maximum lifespan and experiment flags have a 14-day maximum. Enforce these through CI warnings that escalate to blocking checks as flags age.

Step 3: Automate the removal

Step 4: Track pipeline metrics alongside flag counts

New Developer Onboarding: How Stale Feature Flags Slow Down Your Team

Stale feature flags meaningfully slow new hire onboarding. Here's how flag debt creates confusion for new developers and what to do about it.

February 3, 2026·12 min read

Feature Flag Testing Strategy: How to Test Without Losing Your Mind

With n flags you have 2^n possible states. Here's how to build a practical testing strategy that covers what matters without drowning in combinatorial complexity.

January 29, 2026·13 min read

Feature Flags and Trunk-Based Development: The Cleanup Problem Nobody Talks About

Trunk-based development encourages more feature flags — but who cleans them up? The missing piece in every TBD guide and how to fix it.

January 23, 2026·13 min read

View all articles

The slow pipeline symptoms nobody connects to flags

Build time inflation

Test suite bloat

Docker image bloat

Code review velocity

Quantifying the full pipeline impact

The compounding effect: why it gets worse over time

Flag accumulation rate outpaces cleanup

Flags create maintenance gravity

Flag dependencies create hidden coupling

The fix: systematic flag cleanup and its measurable impact

The impact of flag cleanup on pipeline speed

Practical cleanup approaches

The cleanup ROI

Building flag hygiene into your pipeline

Step 1: Make flag accumulation visible

Step 2: Set expiration expectations

Step 3: Automate the removal

Step 4: Track pipeline metrics alongside flag counts

More articles

New Developer Onboarding: How Stale Feature Flags Slow Down Your Team

Feature Flag Testing Strategy: How to Test Without Losing Your Mind

Feature Flags and Trunk-Based Development: The Cleanup Problem Nobody Talks About

The slow pipeline symptoms nobody connects to flags

Build time inflation

Test suite bloat

Docker image bloat

Code review velocity

Quantifying the full pipeline impact

The compounding effect: why it gets worse over time

Flag accumulation rate outpaces cleanup

Flags create maintenance gravity

Flag dependencies create hidden coupling

The fix: systematic flag cleanup and its measurable impact

The impact of flag cleanup on pipeline speed

Practical cleanup approaches

The cleanup ROI

Building flag hygiene into your pipeline

Step 1: Make flag accumulation visible

Step 2: Set expiration expectations

Step 3: Automate the removal

Step 4: Track pipeline metrics alongside flag counts

More articles

New Developer Onboarding: How Stale Feature Flags Slow Down Your Team

Feature Flag Testing Strategy: How to Test Without Losing Your Mind

Feature Flags and Trunk-Based Development: The Cleanup Problem Nobody Talks About